Hi Prashant,
Please take a look on either the Nutch or the Hadoop user@ lists. I've
seen and reported on this previously so it should not be too hard to
find.
hth
Lewis
On Wed, Nov 14, 2012 at 6:07 PM, Prashant Ladha
prashant.la...@gmail.com wrote:
Hi,
I am trying to setup Nutch via Eclipse.
Additionally, please see this issue below and if you are able please
provide feedback based on the patch.
https://issues.apache.org/jira/browse/NUTCH-1486
hth
Lewis
On Tue, Nov 13, 2012 at 8:57 AM, Ferdy Galema ferdy.gal...@kalooga.com wrote:
I'm not a regular Solr user, but here are some
Nice one Gentlemen thank you very much.
Best
Lewis
On Tue, Nov 13, 2012 at 11:39 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
In trunk you can use the Inlink and Inlinks classes. The first for each
inline and the latter to add the Inlink objects to.
Inlinks inlinks = new Inlinks()
your experience on the issue would be excellent.
Best
Lewis
On Tue, Nov 13, 2012 at 1:13 PM, Erol Akarsu eaka...@gmail.com wrote:
Lewis,
Have you checked it to SVN? Where will I get this patch?
Erol Akarsu
On Tue, Nov 13, 2012 at 6:57 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com
Hi,
On Tue, Nov 13, 2012 at 2:36 PM, Erol Akarsu eaka...@gmail.com wrote:
Lewis,
I applied the patch you told me. I replaced schema.xml of sol4 installation
with schme-sol4.xml. Solr 4.0 system is up and running and I can see its
web page with http://localhost:8080/sol40.
You would need to
Hi,
On Tue, Nov 13, 2012 at 3:45 PM, Erol Akarsu eaka...@gmail.com wrote:
Where is this script? bin folder has only nutch script.
https://svn.apache.org/repos/asf/nutch/branches/2.x/src/bin/crawl
I am using nutch 2.1 not trunk. Does it make any difference on behavior of
nutch script?
I
Hi,
On Tue, Nov 13, 2012 at 4:22 PM, Erol Akarsu eaka...@gmail.com wrote:
Nov 13, 2012 11:11:48 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Document contains multiple
values for uniqueKey field: id=[org.apache.nutch:http/, ]
The
Hi,
Can you please open an issue for this. I can confirm that without
adding some additional dependencies I get the following when
attempting to parse an rss feed [0] which I have saved locally.
lewis@lewis-desktop:~/ASF/trunk/runtime/local$ ./bin/nutch plugin feed
Hi Julien,
Link to from wiki maybe?
Safe journey home.
Lewis
On Fri, Nov 9, 2012 at 9:44 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
Hi guys,
For those of you who could not make it to the ApacheCon in Sinsheim, here
are the slides of my talk on Nutch
Hi All,
We recently committed a rather major patch over in GORA which now
provides a WebServices API, enabling Gora to persist data into
(currently supported) Amazon's DynamoDB [0]. Other WebServices such as
Google App Engine and Microsoft Azure, etc have also been discussed
but these will be
Hi Kiran,
Thanks for the persistence on this one, it is greatly appreciated.
Please feel free to open an issue... may be best to even open over on
the GORA jira?
Best
Lewis
On Mon, Nov 5, 2012 at 2:50 PM, kiran chitturi
chitturikira...@gmail.com wrote:
I have just tested with Hbase as
http://hadoop.apache.org/docs/r1.0.3/commands_manual.html#Generic+Options
hth
On Sun, Nov 4, 2012 at 9:15 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
Just try it. With -D you can override Nutch and Hadoop configuration
properties.
-Original message-
From:Joe Zhang
Hi Kiran,
Have you treated Nutch ivy.xml with the gora-hbase artifact then
compile the code?
The tutorial most certainly works.
Lewis
On Thu, Nov 1, 2012 at 9:59 PM, kiran chitturi
chitturikira...@gmail.com wrote:
Hi,
I am trying to configure Nutch GORA with Hbase as shown in the tutorial
Hi,
On Fri, Nov 2, 2012 at 2:43 PM, kiran chitturi
chitturikira...@gmail.com wrote:
I am not sure what versions of HBase are compatible with Nutch
I would advise you to read the Nutch2Tutorial again.
Install and configure HBase. You can get it here (N.B. Gora 0.2 uses
HBase 0.90.4, however
Hi,
On Fri, Nov 2, 2012 at 5:36 PM, cocofan coco...@mailbolt.com wrote:
2012-11-01 14:46:52,027 ERROR security.UserGroupInformation -
PriviledgedActionException as:cocofan
I've never seen this Exception before...honestly.
cause:org.apache.hadoop.mapreduce.lib.input.InvalidInputException:
Hi Dan,
Actually no. It was a project I never ended up getting my teeth into (sigh).
I am going to try this later on today though, so keep this thread
alive and we will see where it goes.
Lewis
On Thu, Nov 1, 2012 at 10:00 AM, dan danv...@gmail.com wrote:
Hi lewis,
any success with this one
Hi Kiran,
Did you ever get anywhere with this one?
Lewis
On Tue, Oct 16, 2012 at 10:30 PM, kiran chitturi
chitturikira...@gmail.com wrote:
Hi,
I am using Nutch 2.x series with updated tika dependencies with hsql
database.
I have did the commands 'inject,generate,fetch' and after that when
Nice one Julien. Its nothing short of a privilege to be part of the various
communities and working alongside you guys.
Have a great night.
Lewis
On Thu, Nov 1, 2012 at 11:39 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
Hi all,
Apologies for cross posting. Srini Penchikala has
I really think this should be in the FAQ's?
http://wiki.apache.org/nutch/FAQ
On Fri, Oct 26, 2012 at 2:10 PM, Markus Jelsma
markus.jel...@openindex.io wrote:
Hi,
You cannot recover the mapper output as far as i know. But anyway, one should
never have a fetcher running for three days. It's
Hi,
On Thu, Oct 25, 2012 at 3:03 PM, manubharghav manubharg...@gmail.com wrote:
Will providing a core-site.xml overwriting some of the permission in
core-default.xml in hadoop jar help ??
It's certainly something I would try.
Also have you tried using the Nutch script at all? If you can get
Hi,
On Tue, Oct 23, 2012 at 2:42 PM, Mouradk mourad...@gmail.com wrote:
This sits in a urls/seed.txt in NUTCH_HOME (not runtime folder but the home
folder generated after unzipping).
Please put the urls directory (with the seed file for bootstrapping)
into /runtime/local and run the command
Hi,
On Tue, Oct 23, 2012 at 11:53 AM, Mouradk mourad...@gmail.com wrote:
I uploaded Nutch 2.1 and tried to get it started but no luck so far. I am
running it on local with Hbase 0.90.6.
HBase compatibility should be fine. In all honesty we *should*
probably upgrade to one of the newer
Hi, Stefan,
To date this is not implemented. I would suggest that this is the case
due to the requirement to design custom crawls. It would be relatively
trivial to get it dumped from within your crawl script.
Lewis
On Tue, Oct 23, 2012 at 2:04 PM, Stefan Scheffler
sscheff...@avantgarde-labs.de
Hi James,
On Mon, Oct 22, 2012 at 1:28 AM, j.sulli...@thomsonreuters.com wrote:
I've figured this out...somewhat. The issue causing the error was that I was
running MySQL with UTF-8 as default and needed to increase the size of the
primarykey column in gora-sql-mapping.xml to 768 (which I
Hi,
On Fri, Oct 19, 2012 at 6:23 PM, sumarlidason sumarlida...@gmail.com wrote:
So, I made some changes to gora.properties, and now im getting null pointer
exception..
Do you wish to detail when and how you are getting this Exception?
It almost seems to me at this point nutch needs a DB for
Hi James,
Have you attempted to make any changes to the host table config in
gora-sql-mapping?
Lewis
On Fri, Oct 19, 2012 at 10:26 AM, j.sulli...@thomsonreuters.com wrote:
Could somebody confirm if the bin/nutch readhostdb command works with MySQL.
I am trying to figure out if it is broke
Hi,
There are a number of major issues with your attempts to get Nutch working.
Please check out our wiki for tutorials on Nutch.
Only Nutch distributions obtained from the official Apache resources
are supported e.g. mirrors... and development versions available from
our SVN area. All of these
Hi Alex,
I've seen similar exceptions numerous times [0] when running the Gora
test suite against HBase however this _always_ occurred against an
HBase version other than the officially supported version of HBase
(which is 0.90.4) when behind a local proxy so I am immediately
tempted to speculate
Hi Kiran,
If you apply the patch to your 2.x branch, then make sure that 'ant
runtime' is executed. Please also make sure that the tika 1.1
dependency _does_not_ exist in your runtime /lib directory as this may
conflict with expected results.
If you could update the ticket it would be excellent.
Hi,
I would also direct you at an issue [0] and set of patches for trunk
and 2.x which use the inet socket to obtain the host IP address if
this is required. It would not be very difficult to get this patch to
also/either obtain the response time I would not think...
Also if anyone feels like
Hi Guys,
I'm sure that these issues should be logged in our Jira as they not
only sound serious but also ship with reasonable sounding possible
solutions.
If any of you feel like opening a ticket(s), it would be great...
patches are always welcome.
Lewis
On Sat, Oct 13, 2012 at 12:14 AM, Tejas
Hi Tolga,
Please take this to the Solr user@ list.
Thank you
Lewis
On Tue, Oct 16, 2012 at 12:13 PM, Tolga to...@ozses.net wrote:
Hi,
I've tried url:fass\.sabanciuniv\.edu AND content:this, and I got results
from both my URLs. What to do?
Regards,
On 10/13/2012 12:48 AM, Alejandro
comment to head over to Solr lists...
hth
Lewis
On Tue, Oct 16, 2012 at 2:01 PM, Tolga to...@ozses.net wrote:
Solr sent me to Nutch list, but okay. Thanks,
On 10/16/2012 02:27 PM, Lewis John Mcgibbney wrote:
Hi Tolga,
Please take this to the Solr user@ list.
Thank you
Lewis
On Tue
Hi,
After every crawl iteration check out your webdb with the readdb tool.
There is pleanty linked to from the wiki on this topic. Check
urlfilters as an important area as well.
hth
Lewis
On Fri, Oct 5, 2012 at 6:08 PM, Hailong Yang hailong.yang1...@gmail.com wrote:
Dear all,
I am trying
To confirm, the gora-sql-0.1.1-incubating atrifact available on maven
central IS interoperable with Nutch 2.1 release. It has not yet been
developed and brought up to date so has been disabled in more recent
Gora releases.
Thanks you
Lewis
On Mon, Oct 8, 2012 at 7:32 PM, Paul Dhaliwal
Hi James,
I think this is a fair suggestion. I would please ask you to open an
issue and submit your patch which would be very welcome indeed.
As you mention, it would be interesting to check the metadata for the
value however your initial suggestion is also valid imho.
Thanks
Lewis
On Fri,
Good Afternoon Everyone,
The Apache Nutch PMC are very pleased to announce the release of
Apache Nutch v2.1. This release continues to provide Nutch users with
a simplified Nutch distribution building on the 2.x development drive
which is growing in popularity amongst the community. As well as
Hi James,
On Thu, Oct 4, 2012 at 2:59 AM, j.sulli...@thomsonreuters.com wrote:
Lewis and Chris,
Agree that The Index Structure page is very useful documentation. I went
through the fields/plugins listed in your link using Nutch 2.1 rc and most
work. I was able to get positive results for
[0]
https://svn.apache.org/repos/asf/nutch/branches/2.x/src/plugin/index-more/src/java/org/apache/nutch/indexer/more/MoreIndexingFilter.java
On Thu, Oct 4, 2012 at 7:36 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi James,
On Thu, Oct 4, 2012 at 2:59 AM, j.sulli
Good Afternoon Everyone,
We are glad to announce that the result for the Apache Nutch
release-2.1 RC#1 was successful and has passed with the following
VOTE's
4x +1 Release this package as Apache Nutch 2.1
Sebastian Nagel*
Chris Mattmann*
Lewis McGibbney*
James Sullivan
0x -1 Do not release
Hi Eyeris,
On Thu, Oct 4, 2012 at 6:24 PM, Eyeris Rodriguez Rueda eru...@uci.cu wrote:
Hi all.
I want to use Nutch 1.5.1 version, I have download nutch 1.5.1(bin) and src
also,
I think you've just uncovered a problem with the .zip archives e.g.
that the nutch script is not present in the /bin
Hi Matt,
I know th6ere is a pile of stuff to add to this but for the time being
(until I dive into your response in detail) please see below
On Tue, Oct 2, 2012 at 11:17 PM, Matt MacDonald m...@nearbyfyi.com wrote:
Hi,
...
5) What value should I set for gora.buffer.read.limit? Currently it's
@ Ian,
Apologies, this one slipped through the net.
On Wed, Sep 26, 2012 at 8:26 PM, Ian Truslove ian.trusl...@nsidc.org wrote:
ubuntu:~/apache-nutch-svn-2.1$ ~/hadoop-1.0.3/bin/hadoop jar
build/apache-nutch-2.1.job org.apache.nutch.crawl.Crawler urls -dir urls
-depth 3 -topN 5
The
Hi CarinaBambina,
There was a bug with 1.5 so we released 1.5.1, can you please try this
instead and get back to us with your results.
Thank you
Lewis
On Tue, Oct 2, 2012 at 4:25 PM, CarinaBambina carina.rei...@yahoo.de wrote:
I'm having the same problem with Nutch 1.5. I also checked all
Hi,
For starters can you please use 1.5.1.
On Tue, Oct 2, 2012 at 4:32 PM, CarinaBambina carina.rei...@yahoo.de wrote:
Hi,
i'm curious if you have come up with any solution yet? As i'm having the
exact same problem!
When i start the crawl the entered Url is parsed perfectly, but for all
Hi Chris,
One of the main problems here is that we very rarely know which
version of Nutch you are using, what nature of configuration and in
what kind of deployment.. the truth is that this makes it difficult
for us to help you out. This is also applicable to any Hadoop, Solr.
HBase, Cassandra,
Hi Chris,
Please see here [0] for the most up-to-date account of the fields for
building your Solr index.
I tried to bring this bang up to date a while back and more recently
when writing some trivial plugin tests however please shout about
anything which is not correct and we can edit
Hi All,
Anyone else for this VOTE?
Sorry to be a pest!
Thanks
Lewis
On Fri, Sep 21, 2012 at 4:07 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Everyone,
A candidate for Apache Nutch 2.1 is available at:
http://people.apache.org/~lewismc/apache-nutch-2.1
The release
Hi Chris,
On Mon, Oct 1, 2012 at 3:27 PM, Christopher Gross cogr...@gmail.com wrote:
unzipped untarred it,
I don't think you need to do both!
BUILD FAILED
/tmp/nutch-2.0/build.xml:72: Specify at least one source--a file or
resource collection.
Mmmm... can you even try moving it out of
Hi Chris,
On Mon, Oct 1, 2012 at 4:17 PM, Christopher Gross cogr...@gmail.com wrote:
I moved it to a different directory, same errors.
Mmm. I'm stumped here. What OS as you on?
I'll try 2.1 and see if that works any better.
Please do and get back to us with your results as we are currently
with a 2.x version of Nutch anytime
soon, I just wanted to make sure that when I'm ready to deploy it'll
be a full release, so if you're really pushing for 2.1 to be out soon,
then that's what I'll work with.
-- Chris
On Mon, Oct 1, 2012 at 11:31 AM, Lewis John Mcgibbney
lewis.mcgibb
Hi Chris,
On Mon, Oct 1, 2012 at 7:09 PM, Christopher Gross cogr...@gmail.com wrote:
We have ports blocked on our box, so that may be causing issues with
Ivy (which is why I prefer just standard ant and having all the
required jars sitting in a lib directory).
Well the pro of having Nutch
Hi Chris,
On Mon, Oct 1, 2012 at 8:52 PM, Christopher Gross cogr...@gmail.com wrote:
OK, I added the port being used by hbase to iptables, and now I'm farther.
I'm getting:
12/10/01 19:44:17 ERROR fetcher.FetcherJob: Fetcher: No agents listed
in 'http.agent.name' property.
But I do have an
Hi Kiran
On Mon, Oct 1, 2012 at 7:46 PM, kiran chitturi
chitturikira...@gmail.com wrote:
I have made an improvement in patches for the parse-metatags plugin and
posted the patches here. https://issues.apache.org/jira/browse/NUTCH-1467
Great work!
Can this plugin be included in nutch-2.0 ?
Hi Bai,
If you could use the script @NUTCH-1087 [0] and provide insight into
your findings it would be very much appreciated. It is the intention
to integrate this into 2.x one it has been tested enough. The glitch
you highlight is exactly the type of stuff we need to find.
Thanks
Lewis
[0]
. Changing the brackets made the error go away, but I still
wasn't able to get nutch to run until I removed the extraneous job files.
On Fri, Sep 28, 2012 at 10:03 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Bai,
If you could use the script @NUTCH-1087 [0] and provide insight
Hi Kiran,
On Thu, Sep 27, 2012 at 3:24 PM, kiran chitturi
chitturikira...@gmail.com wrote:
Is it because nutch 2.0 does not support postgresql ? I have setup
postgresql the same way as mysql.
Yes AFAIK currently HSQLDB and MySQL are the only SQL implementation
currently supported. The idea
Hi,
AFAIK this plugin has not been used extensively with Nutch 2.x however
here are some of my early observations which should get it working.
1. The plugin's plugin.xml and java source quotes code from the jsch
package [0] so you will need to grab that and make it available...
please see below
Hi Everyone,
A candidate for Apache Nutch 2.1 is available at:
http://people.apache.org/~lewismc/apache-nutch-2.1
The release candidate is a src.zip and src.tar.gz ONLY
archive of the sources in:
http://svn.apache.org/repos/asf/nutch/tags/release-2.1/
We release Nutch 2.1 in this fashion due
Hi Max,
On Thu, Sep 20, 2012 at 1:44 PM, Max Dzyuba max.dzy...@comintelli.com wrote:
Sorry for many emails.
Lewis, thanks again for a hint about parsechecker tool.
No hassle, I am glad you get it sorted and yes the parsechecker is a
great tool + saves you a bunch of time.
Best
Lewis
Hi Again,
On Wed, Sep 19, 2012 at 8:39 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
On Wed, Sep 19, 2012 at 1:54 PM, Žygimantas Medelis zzy...@gmail.com wrote:
Its the problem with gora v0.2.1 which does not work with current nutch 2.
I've just run a medium sized focused crawl
Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Again,
On Wed, Sep 19, 2012 at 8:39 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
On Wed, Sep 19, 2012 at 1:54 PM, Žygimantas Medelis zzy...@gmail.com wrote:
Its the problem with gora v0.2.1 which does not work with current nutch 2.
I've
Hi,
On Wed, Sep 19, 2012 at 3:37 PM, Max Dzyuba max.dzy...@comintelli.com wrote:
2012-09-19 16:26:16,106 INFO httpclient.HttpMethodDirector - No credentials
available for BASIC 'realm'@host.org:80
I don't understand why Nutch complains about No credentials available for
BASIC
Hi,
On Wed, Sep 19, 2012 at 1:54 PM, Žygimantas Medelis zzy...@gmail.com wrote:
Its the problem with gora v0.2.1 which does not work with current nutch 2.
Can you elaborate on what you think is wrong here? To give you some
insight here. Between Gora 0.2 and 0.2.1 a substantial effort was put
Best tool to use is the parsechecker, it is a quick neat way to see
whether your protocol/fetch/authentication is working then whether
your parser is extracting the text and metadata you require.
On Wed, Sep 19, 2012 at 8:30 PM, Max Dzyuba max.dzy...@comintelli.com wrote:
Hi Lewis,
I used that
Hi,
On Tue, Sep 18, 2012 at 2:34 PM, Žygimantas Medelis zzy...@gmail.com wrote:
Commands I am issuing
Can you read your db and see if there are any pages pending a fetch?
Also I was getting NullPointerException on inject before
changing conf/gora-cassandra-mapping.xml
from: class
Solr logs?
On Fri, Sep 14, 2012 at 9:33 PM, Bai Shen baishen.li...@gmail.com wrote:
I have a nutch 2 setup that I got working with solr about a month ago. I
had to shelve it for a little while and I've recently come back to it.
Everything seems to be working fine except for the solr
Hi,
Try index-more
http://wiki.apache.org/nutch/FAQ#How_can_I_find_out.2BAC8-display_the_size_and_mime_type_of_the_hits_that_a_search_returns.3F
hth
Lewis
On Fri, Sep 14, 2012 at 9:22 PM, Eyeris Rodriguez Rueda eru...@uci.cu wrote:
Hi, all.
I am using nutch and solr since 1 year and i need
Hi,
On Fri, Sep 14, 2012 at 12:59 AM, dpverma patia...@gmail.com wrote:
I am using tomcat6, nutch1.1 and solr1.4
For starters this is probably your main mistake! I would seriously
urge you to upgrade your Nutch distribution.
I've just used to parsechecker with -dumpText and you url and I get
Hi,
Nice one Julien, yeah I hope you see you and others in Sinsheim in November
and looking forward to attending your talk... the lots of lager afterwards.
Best
Lewis
On Thu, Sep 13, 2012 at 11:39 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
Hi,
I'd just like to mention that I
Hi,
On Wed, Sep 12, 2012 at 1:41 PM, Stefan Scheffler
sscheff...@avantgarde-labs.de wrote:
I try to run nutch 2.0 on a hadoop cluster and get the following exception.
HADOOP_CLASSPATH=lib/apache-nutch-1.6-SNAPSHOT.jar hadoop
org.apache.nutch.crawl.Crawl urls -dir test -depth 2 -topN 5
The
Hi,
Take a look at ftp.content.limit property in nutch-default.xml and set
it accordingly in nutch-site.xml
Thanks
Lewis
On Tue, Sep 11, 2012 at 12:20 AM, dpverma patia...@gmail.com wrote:
Can you pls let me know how you solved your problem?
I am also getting the same error which you had.
Hi Kiran,
I think having line numbers is a bad option for an exploration of the
codebase. It really does us no favours as the codebase changes through
time.
Currently (even without looking at them) I can most certainly tell you
that they will not be 100% accurate. If you are able to provide more
Hi Matt,
I don't know if you got my message on the github mirrior issue.
If you could get the patch uploaded to a new Nutch Jira ticket (unless
one is already open) then I will be very happy to test as some free
time now means I am able to test a few patches.
Thanks
Lewis
On Sat, Sep 8, 2012
You can of course change the source and destination field mappings so
that you don't need to query URL id's
This is a workaround though and doesn't fully address the issue of
querying by URL id.
Lewis
On Sat, Sep 8, 2012 at 2:22 PM, Alaak al...@gmx.de wrote:
Hi,
I have a problem with the way
Hi,
On Thu, Sep 6, 2012 at 5:50 AM, gaurav.gupta gaurav.gu...@edynamic.info wrote:
C:\nutch\local\conf\crawl-urlfilter.txt as specified in my above post.
This no longer exists... that might be a problem
--
Lewis
'title'='Sabancı
Üniversitesi'
Is it because of 'Sabancı Üniversitesi'? SOLR/example/solr/conf/schema.xml
specifies UTF-8
Regards,
On 09/04/2012 05:04 PM, Lewis John Mcgibbney wrote:
I don't think you have your HSQLDB server running, this is essential
requirement to store the crawldb
Hi Vijith,
On Wed, Sep 5, 2012 at 5:55 AM, Vijith vijithkv...@gmail.com wrote:
Are you able to submit a patch for this?
you mean a patch for the build.xml file... surely I can.
Excellent :0)
I also noticed that there is currently no way to copy the compiled
plugin test cases through to
I think you've incorrectly passed your regex- as your seed URL list
when you've injected.
As a side note it is always VERY helpful to provide basic info such as
the Nutch version, the steps you took to reproduce the error, etc...
basic stuff.
hth
Lewis
On Wed, Sep 5, 2012 at 10:16 AM,
I don't think you have your HSQLDB server running, this is essential
requirement to store the crawldb, WebPage and Host data etc.
You can follow the various tutorials here to get you going
http://wiki.apache.org/nutch/#Nutch_2.X_tutorial.28s.29
hth
Lewis
On Tue, Sep 4, 2012 at 2:27 PM, Tolga
If you look at lines 395-399 in build.xml [0] you need to add
copy file=${test.src.dir}/crawl-tests.xml
todir=${test.build.classes}/
copy file=${test.src.dir}/domain-urlfilter.txt
todir=${test.build.classes}/
copy
Before doing runtime local you need to ensure the test are executed
and all of the resources are present in the build directory.
So please do ant test, then ant runtime, all of the test resources
should then be moved to the runtime/local directory.
The runtime target does NOT rely on the test
Hi Max,
On Tue, Aug 28, 2012 at 3:24 PM, Max Dzyuba max.dzy...@comintelli.com wrote:
Is it possible to use the same crawldb but store segment data in a different
directory for consecutive crawls using the bin/nutch crawl command? I
thought that there is no option to specify the path to crawldb
Hi Ye,
If you could contribute this to the community as a patch it would be
greatly appreciated.
If you need any help wit this then please ping us on dev@nutch and we
will be more than happy to help you out.
Thanks you in advance
Lewis
On Thu, Aug 30, 2012 at 2:14 PM, Ye T Thet
sorry speech marks
just run
any runtime
It most certainly works, if it does not then there is something wrong
with your local copy.
On Wed, Aug 29, 2012 at 7:18 AM, Tolga to...@ozses.net wrote:
What brackets? I don't see brackets.
On 08/28/2012 03:39 PM, Lewis John Mcgibbney wrote:
I
Please have a look at the discussion below
http://www.mail-archive.com/user@nutch.apache.org/msg04176.html
It should help you out.. or point you in the correct direction at least.
hth
Lewis
On Wed, Aug 29, 2012 at 1:13 PM, ytthet yethura.t...@gmail.com wrote:
Hi Folks,
I am indexing local
What version of Nutch is this?
Lewis
On Wed, Aug 29, 2012 at 9:58 AM, xpow swirja...@gmail.com wrote:
Hello,
I've tried to use the protocol-smb plugin with nutch. The nutch read and
parsed the documents correctly, but afterward, when it hit the crawldb,
crawl.CrawlDbReducer, i got a lot of
In the SVN area can you point me to the protocol plugin please?
http://svn.apache.org/repos/asf/nutch/
Thank you
Lewis
On Wed, Aug 29, 2012 at 3:22 PM, Matteo Simoncini sicc...@gmail.com wrote:
Sorry, I forgot it.
1.5
Matteo
2012/8/29 Lewis John Mcgibbney lewis.mcgibb...@gmail.com
Simoncini sicc...@gmail.com wrote:
I'm not so familiar with SVN. Is this what you mean?
http://svn.apache.org/repos/asf/nutch/branches/branch-1.5/
Matteo
2012/8/29 Lewis John Mcgibbney lewis.mcgibb...@gmail.com
In the SVN area can you point me to the protocol plugin please?
http
Please see the tutorial and search on the user lists (you can find
plenty of info on this via out website)
http://www.mail-archive.com/user%40nutch.apache.org/
http://wiki.apache.org/nutch/#Other_Tutorial.28s.29
On Wed, Aug 29, 2012 at 4:22 PM, makaveli91ro makaveli9...@yahoo.com wrote:
Hello
try ant runtime
This will generate the runtime deployment(s) you require to get going,
however it _does_not_ give you a ready to rock deployment.
You should check out the following tutorials below
http://wiki.apache.org/nutch/Nutch2Tutorial
http://nlp.solutions.asia/?p=180
Lewis
On Mon, Aug
.
On Sun, Aug 26, 2012 at 3:39 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Robert,,
On Sun, Aug 26, 2012 at 5:25 AM, Robert Irribarren rob...@algorithms.io
wrote:
org.apache.solr.common.SolrException: Server Error
Server Error
...
Please read this [0] before
The crawldb needs to receive updates of data in fetched segments, once
you generate it will calculate what needs to be fetched in next
iteration. It is OK to store segments in different locations but
typicaly you would want to maintain one crawldb for all of your
records... unless of course you
:46 PM, Tolga to...@ozses.net wrote:
Do I need HBase as well?
On 08/27/2012 03:00 PM, Lewis John Mcgibbney wrote:
try ant runtime
This will generate the runtime deployment(s) you require to get going,
however it _does_not_ give you a ready to rock deployment.
You should check out
further to Markus' comments please also see
property
nameparser.skip.truncated/name
valuetrue/value
descriptionBoolean value for whether we should skip parsing for
truncated documents. By default this
property is activated due to extremely high levels of CPU which
parsing can sometimes
You can easily run any plugin from the terminal using
./bin/nutch plugin
in the case of the HtmlParser main() method you would want to do
./bin/nutch plugin parse-html org.apache.nutch.parse.html.HtmlParser
$pathToLocalFile
You have actually identified an improvement which we could do with
Hi Robert,,
On Sun, Aug 26, 2012 at 5:25 AM, Robert Irribarren rob...@algorithms.io wrote:
org.apache.solr.common.SolrException: Server Error
Server Error
...
Please read this [0] before posting to the list. It saves both you and
us loads of time and also means there is less unnecessary
Hi,
@Alxsss I hope Walters suggestion(s) help you out here.
@Walter I've added your model answer to the wiki [0] this is a great
response and I just couldn't help but add it. Thank you
Lewis
[0]
. Proud to have been around since 2005 (7 of them!)
:)
Cheers,
Chris
On Aug 9, 2012, at 1:31 PM, Lewis John Mcgibbney wrote:
Nice one Julien
I'm going to update the site with this as its a pretty huge milestone
@Apache and a lot of projects and current developers owe
Hi Robert,
There is a parse-swf plugin for Nutch which uses the JavaSWF library
[0] to parse such files (of what version I am not currently aware) and
I can confirm that it does work e.g. when used from command line I can
obtain parse data from within a local swf file.
I am not sure if this
801 - 900 of 1408 matches
Mail list logo