this on Windows under cygwin
then in your config files you MUST NOT use the cygwin paths (like
/cygdrive/d/...) because Java can't see them.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic
path).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
that this is an early
preview. Also, various UI glitches are probably related to the Thinlet
toolkit - again, one day I may re-write Luke using something else, but
for now I don't have the strength to do it. :)
--
Best regards,
Andrzej Bialecki
, or the problem I mentioned above.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
insist on
saying that it's RUDE to do this.
Anyway, Google monitors such attempts and after you issue too many
requests your IP will be blocked for a duration - so no matter if you go
the polite or the impolite way you won't be able to do this.
--
Best regards,
Andrzej Bialecki
numThreads * numMapTasks per node.
So be careful to set it to a number that doesn't overwhelm your network ;)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
. I strongly recommend setting up a local caching DNS.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
be enough.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
is to be able to phase out
old segments, so that you can be sure that you can delete old segments
after N days, because all their pages have been surely scheduled for
refetching and will be found in a newer segment.
--
Best regards,
Andrzej Bialecki
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
are not really the
same page, so you need to be careful ...
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
, and not the content of
crawldb.
The command 'bin/nutch readseg -dump segmentName output' should do
the trick.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
dependencies needed to run Nutch except for Hadoop libraries (which are
also required).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
of fetching.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
will dump just the content part:
./bin/nutch readseg -dump crawl/segments/20090903121951 toto -nofetch
-nogenerate -noparse -noparsedata -noparsetext
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
in
the logs why they aren't.
See above.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
Eric wrote:
Does anyone know if it possible to target only certain links for
crawling dynamically during a crawl? My goal would be to write a plugin
for this functionality but I don't know where to start.
URLFilter plugins may be what you want.
--
Best regards,
Andrzej Bialecki
multiple segments from one job, but it's not implemented yet.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
the parsing and updatedb just from these segments,
without waiting for all 16 segments to be processed.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
a special flag (in metadata) that prevents fetching. This
requires that you implement a custom scoring plugin.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded
to implement than a ScoringFilter.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
QueryFilter plugin.xml you declare that QueryParser should
pass your special fields without treating them as terms, and in the
implementation you create a BooleanClause to be added to the translated
query.
--
Best regards,
Andrzej Bialecki
is this particular site, then you know the positions of
navigation items, right? Then you can remove these elements in your
HtmlParseFilter, or modify DOMContentUtils (in parse-html) to skip these
elements.
--
Best regards,
Andrzej Bialecki
of the smallest blocks, where link number is high
- these are likely navigational elements.
* reconstruct the whole page from the remaining blocks.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
/nutch/trunk
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Eric Osgood wrote:
So the trunk contains the most recent nightly update?
It's the other way around - nightly build is created from a snapshot of
the trunk. The trunk is always the most recent.
--
Best regards,
Andrzej Bialecki
for the server side.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
I agree with Dennis - use Nutch if you need to do a larger-scale
discovery such as when you crawl the web, but if you already know all
target pages in advance then Solr will be a much better (and much easier
to handle) platform.
--
Best regards,
Andrzej Bialecki
is valid, and cannot be written to.
Are you sure you are running a single datanode process per machine?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
- when crawling filesystems
each file in a directory is treated as an outlink, and this limit is
then applied.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded
, to keep
track of the parent URL. The rest should be handled automatically,
although there are some other complications that need to be handled as
well (e.g. don't recrawl sub-documents).
--
Best regards,
Andrzej Bialecki
?
It is. This problem is rare - I think I crawled cumulatively ~500mln
pages in various configs and it didn't occur to me personally. It
requires a few things to go wrong (see the issue comments).
--
Best regards,
Andrzej Bialecki
missed here? Does Nutch allow us to put the index
on a network location?
UNC paths are not supported in Java - you need to mount this location as
a local volume.
--
Best regards,
Andrzej Bialecki
;
}
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
such URLs directly from CrawlDb (using e.g.
CrawlDbReader API) and then uses SolrJ API to send the same delete
requests + commit.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
Gora Mohanty wrote:
On Mon, 26 Oct 2009 17:26:23 +0100
Andrzej Bialecki a...@getopt.org wrote:
[...]
Stale (no longer existing) URLs are marked with STATUS_DB_GONE.
They are kept in Nutch crawldb to prevent their re-discovery
(through stale links pointing to these URL-s from other pages
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
the longest is assigned a lot of URLs from a
single host.
A workaround for this is to limit the max number of URLs per host (in
nutch-site.xml) to a more reasonable number, e.g. 100 or 1000, whatever
works best for you.
--
Best regards,
Andrzej Bialecki
.
* minor issue - when specifying the path names of segments and crawldb,
do NOT append the trailing slash - it's not harmful in this particular
case, but you could have a nasty surprise when doing e.g. copy / mv
operations ...
--
Best regards,
Andrzej Bialecki
without actually using the language-identifier plugin?
You need to add the language-identifier plugin to the requires section
in your plugin.xml, like this:
requires
import plugin=nutch-extensionpoints/
import plugin=language-identifier/
/requires
--
Best regards,
Andrzej
respond to it from the same
email account that you were subscribed from?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
Andrzej Bialecki wrote:
doesn't work, as reported by me and others last week.
Thanks,
Did you get the message with the subject of confirm unsubscribe from
nutch-user@lucene.apache.org and did you respond to it from the same
email account that you were subscribed from?
.. I just verified
), and you can use its API to retrieve either all or
individual records from a segment (using URL as key).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded
platform to develop and experiment with such components.
-
Briefly ;) that's what comes to my mind when I think about the future of
Nutch. I invite you all to share your thoughts and suggestions!
--
Best regards,
Andrzej Bialecki
the index.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
of the file://
schema FileSystem. Now you probably forgot to put hadoop-default.xml on
your classpath. Go to Build Path and add this file to your classpath,
and all should be ok.
--
Best regards,
Andrzej Bialecki
them to use different ports AND different local paths.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact
,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
depends on the last
modified timestamp being present on the webpage that is being crawled,
which I believe is not mandatory. Still those who do set it would benefit.
This is already implemented - see the Signature / MD5Signature /
TextProfileSignature.
--
Best regards,
Andrzej Bialecki
characters
outside this encoding will be replaced by question marks.
If you want to get an exact copy of the raw binary content then please
use the SegmentReader API.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information
are their victims). The
source code is there, if you choose you can modify it to bypass these
restrictions, just be aware of the consequences (and don't use Nutch
as your user agent ;) ).
--
Best regards,
Andrzej Bialecki
) - and I agree that we
should have a 1.1 release in the near future.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
own extended DB-s.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
Dennis Kubes wrote:
I would like to get a couple things in this release as well. Let me
know if you want help with the upgrade.
You mean you want to do the Hadoop upgrade? I won't stand in your way :)
--
Best regards,
Andrzej Bialecki
!
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
, indeed this looks like a bug - we should instead do like this:
if (datum.getFetchInterval() maxInterval) {
datum.setFetchInterval(maxInterval * 0.9);
}
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
in
your crawldb.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
relaxed Signature implementation, e.g.
TextProfileSignature.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
the db in order to update the signatures.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
logging. ;)
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
unique hosts are in the current
working set.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
which
thread you replied to and your question is hidden in that thread and
gets less attention. It makes following discussions in the mailing
list archives particularly difficult.
--
Best regards,
Andrzej Bialecki
are unpredictable.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
tasks tend to hang around, but still
some of them finish and make space for new tasks. As time goes on,
majority of your tasks becomes slow tasks, so the overall speed
continues to drop down.
--
Best regards,
Andrzej Bialecki
week I will be working on integrating the patches from Julien, and
if time permits I could perhaps start working on a speed monitoring to
lock out slow servers.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information
that the process is in a (single) reduce phase sorting the
data - with larger jobs in local mode the sorting phase may take very
long time, due to a heavy disk IO (and in disk-wait state it may be
uninterruptible).
Try to generate a thread dump to see what code is being executed.
--
Best regards,
Andrzej
Paul Tomblin wrote:
On Sat, Nov 28, 2009 at 5:48 PM, Andrzej Bialecki a...@getopt.org wrote:
Paul Tomblin wrote:
-bash-3.2$ jstack -F 32507
Attaching to process ID 32507, please wait...
Hm, I can't see anything obviously wrong with that thread dump. What's the
CPU and swap usage
partial indexes, so you need to specify each /part- dir as an
input to dedup.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
during
generation. See ScoringFilter.generatorSortValue(..), you can modify
this method in scoring-opic (or in your own scoring filter) to
prioritize certain urls over others.
--
Best regards,
Andrzej Bialecki
, please creata a JIRA issue in Nutch, and attach
the patch.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com
the nutch*.job
to a separate Hadoop cluster? Could you please try it with a standalone
Hadoop cluster (even if it's a pseudo-distributed, i.e. single node)?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval
, that contain an outlink to that page.
Very good explanation, that's exactly the reasons why Nutch never
discards such pages. If you really want to ignore certain pages, then
use URLFilters and/or ScoringFilters.
--
Best regards,
Andrzej Bialecki
that changes the matching urls to e.g. always lose
the 'www.' part.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
part-N partial indexes).
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
On 2009-12-14 16:05, BrunoWL wrote:
Nobody?
Please, any answer would good.
Please check this issue:
https://issues.apache.org/jira/browse/NUTCH-479
That's the current status, i.e. this functionality is available only as
a patch.
--
Best regards,
Andrzej Bialecki
should commit the change?
Thanks for reporting this - could you perhaps try to apply that patch
and see if it helps? I hesitated to commit it because it's really a
workaround and not a solution ... but if it works for you then it's
better than nothing.
--
Best regards,
Andrzej Bialecki
regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
that you are looking for is an IndexingFilter - this
receives a copy of the document with all fields collected just before
it's sent to the indexing backend - and you can freely modify the
content of NutchDocument, e.g. do additional analysis, add/remove/modify
fields, etc.
--
Best regards,
Andrzej
On 2009-12-22 16:07, Claudio Martella wrote:
Andrzej Bialecki wrote:
On 2009-12-22 13:16, Claudio Martella wrote:
Yes, I'am aware of that. The problem is that i have some fields of the
SolrDocument that i want to compute by text analysis (basically i want
to do some smart keywords extraction
(2 documents), and if
the problem persist please report this in JIRA.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
linkdb with new
links from a new segment.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram
in development test phases, less in
production though.
Right. Also, a common practice is to keep the raw data for a while just
to make sure that the parsing and indexing went smoothly (in case you
need to re-parse the raw content).
--
Best regards,
Andrzej Bialecki
is configurability - if you put this code in a separate plugin,
you can easily turn it on/off, but if it sits in HtmlParser this would
be more difficult to do.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information
is slightly less expressive but much
much faster.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
On 2010-01-15 20:09, MilleBii wrote:
Inject is meant to seed the database at the start.
But I would like to inject new urls on a production crawldb, I think it
works but I was wondering if somebody could confirm that.
Yes. New urls are merged with the old ones.
--
Best regards,
Andrzej
hdfs.DFSClient - DFS Read:
java.io.IOException: Could not obtain block: blk_-6931814167688802826_9735
file=/user/root/crawl/indexed-segments/20100117235244/part-0/_1lr.prx
This error is commonly caused by running out of disk space on a datanode.
--
Best regards,
Andrzej Bialecki
On 2010-02-09 03:08, Hua Su wrote:
Thanks. But heritrix is another project, right?
Please see this Git repository, it contains the latest work in progress
on Nutch+HBase:
git://github.com/dogacan/nutchbase.git
--
Best regards,
Andrzej Bialecki
params,
such as sessionId, print=yes, etc) or completely unrelated (human
errors, peculiarities of the content management system, or mirrors). In
your case it seems that the same page is available under different
values of g2_highlightId.
--
Best regards,
Andrzej Bialecki
On 2010-02-20 23:32, reinhard schwab wrote:
Andrzej Bialecki schrieb:
On 2010-02-20 22:45, reinhard schwab wrote:
the content of one page is stored even 7 times.
http://www.cinema-paradiso.at/gallery2/main.php?g2_page=8
i believe this comes from
Recno:: 383
URL::
http://www.cinema-paradiso.at
no longer exists. Sorry :( However, you can
still check out that code from CVS repository at nutch.sf.net .
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix
/boilerpipe/ .
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
still a few months away.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
, is there really no content for the redirected
url?
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info
generating the response ...
it was a total mess.
So, if you target 10 sites, you can make it work. If you target 10,000
sites all using slightly different methods, then forget it.
--
Best regards,
Andrzej Bialecki
, it's complex and fragile.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
define these weights in the configuration, look for query boost
properties.
--
Best regards,
Andrzej Bialecki
___. ___ ___ ___ _ _ __
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http
501 - 600 of 620 matches
Mail list logo