If this was done after you indexed your content then you will need to
reindex all of your content to make this field searchable in your solr
index.
On Mon, Jan 16, 2012 at 5:31 AM, Vijith vijithkv...@gmail.com wrote:
Hi Lewis,
Ya it was when I added a field like -
field dest=keywords
Im indexing it right away when I am crawling ( using -solr ). Iam
using the 'crawl' command. should I use individual commands for
inject, fetch etc..
l clear off the crawl data and solr index before I crawl. Any clue ?
On Mon, Jan 16, 2012 at 1:48 PM, Lewis John Mcgibbney
You would need a parsing fetcher for this to work. Also the fetch filter may
offer some insights.
https://issues.apache.org/jira/browse/NUTCH-828
We do similar things with outlinks while fetching.
Hi Lewis,
Thanks for the reply. What I really want to achieve is to find
the occurrence of
hi
Hi,
I started having this problem recently. For some reason, I did not have it
before, when working with Nutch 1.4 pre-release code. The stack trace
would be:
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getSpli
ts(SolrDeleteDuplicates.java:200) at
Thanks Markus. I think that will give me a good starting point.
On Mon, Jan 16, 2012 at 2:11 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
You would need a parsing fetcher for this to work. Also the fetch filter may
offer some insights.
https://issues.apache.org/jira/browse/NUTCH-828
Hello all,
one of the sites I'm crawling doesn't have the robots.txt file, so I decide
to modify RobotRulesParser.java so to give it default rules (EMPTY_RULES).
But apparently, Nutch doesn't crawl it properly.
Is it the correct way to handle this?
Is it a better alternative?
Remi
Hello all,
I'm getting invalid uri error with some link that have three dots, i.e.
They work perfectly well in browsers (IE and Chrome) but,
apparently, not with Nutch.
Is this a known issue? Any idea on how to handle it?
Remi
Ok, for time being I'll stand-by and wait for solution. This is way beyond
my competence :-(
On Thu, Jan 12, 2012 at 11:47 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Remi,
WRT fixing Nutch 1.2 I can't comment, we do not support this version any
longer and it is no longer
It comes under the error java.lang.IllegalArgumentException
On Mon, Jan 16, 2012 at 3:58 PM, remi tassing tassingr...@gmail.com wrote:
Hello all,
I'm getting invalid uri error with some link that have three dots, i.e.
They work perfectly well in browsers (IE and Chrome) but,
copy the stack trace please
On Monday 16 January 2012 14:58:46 remi tassing wrote:
Hello all,
I'm getting invalid uri error with some link that have three dots, i.e.
They work perfectly well in browsers (IE and Chrome) but,
apparently, not with Nutch.
Is this a known issue? Any idea
Hello ,
this is a snapshot of the log:
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
Hello,
after crawling is completed, I output the crawled urls with the following
command
bin/nutch readdb crawl/crawldb -dump output
With 170 crawled urls, only one shows as db_fetched. That's why I think
something is wrong.
When I asked for the correct way to handle this, I meant what is
On Monday 16 January 2012 15:17:21 remi tassing wrote:
Hello,
after crawling is completed, I output the crawled urls with the following
command
bin/nutch readdb crawl/crawldb -dump output
With 170 crawled urls, only one shows as db_fetched. That's why I think
something is wrong.
The
This? https://uri1...From=stats
That's not a correct or valid URL if you ask me.
On Monday 16 January 2012 15:12:51 remi tassing wrote:
Hello ,
this is a snapshot of the log:
-activeThreads=10, spinWaiting=9, fetchQueues.totalSize=96
-activeThreads=10, spinWaiting=9,
Hello Markus,
thanks for the help!
Just to clarify a little bit. In my previous message, uri1 represented a
normal, ordinary URL, I just didn't want to copy the exact URL.
The weird part is that it all works in the browser...
On Mon, Jan 16, 2012 at 4:35 PM, Markus Jelsma
hi
Hi,
I started having this problem recently. For some reason, I did not
have it
before, when working with Nutch 1.4 pre-release code. The stack trace
would be:
org.apache.nutch.indexer.solr.SolrDeleteDuplicates$SolrInputFormat.getS
pli
16 matches
Mail list logo