Hi Jon,
On Wed, Nov 20, 2013 at 11:08 AM, user-digest-h...@nutch.apache.org wrote:
I do see that stuff is getting into Accumulo but, in my unexperienced
opinion, it looks like the map method is never getting called in the job.
I'm not sure if this is supposed to happen after the
Hi Jon,
As you've guessed by now this is not so much a Nutch specific problem.
I'm CC'ing user@gora in here as well.
On Fri, Nov 15, 2013 at 8:05 PM, user-digest-h...@nutch.apache.org wrote:
I was wrong. So changing the gora.datastore.accumulo.user property caused
the inject to finish and on
Hi Jon,
On Thu, Nov 14, 2013 at 4:15 PM, user-digest-h...@nutch.apache.org wrote:
Unable to inject seeds with
29017 by: Jon Uhal
First, here is my environment:
Hadoop 1.2.1
Accumulo 1.4.4
Zookeeper 3.4.5
Gora 0.3
Solr 4.5.1
All software revisions look fine so good start :)
Hi Jon,
Glad to hear that your making some more progress!
On Thu, Nov 14, 2013 at 8:45 PM, user-digest-h...@nutch.apache.org wrote:
So I think it has to do with Accumulo somehow. I reverted the
conf/gora.properties setting for mock from false to:
gora.datastore.accumulo.mock=true
and
Hi Edward,
On Fri, Nov 8, 2013 at 8:46 PM, user-digest-h...@nutch.apache.org wrote:
user Digest 8 Nov 2013 20:46:59 - Issue 2099
As to the host table,I am not quite sure about it's function, like in
which step(inject,generate,fetch,parse,updatedb,updatehostdb) is this
host table get
Hi Olle,
On Tue, Nov 5, 2013 at 1:29 PM, user-digest-h...@nutch.apache.org wrote:
user Digest 5 Nov 2013 13:29:55 - Issue 2097
Hi Lewis,
Just a quick question - I'm having a slight problem with the NUTCH-828v3
patch. I check out nutch trunk, make sure it runs ok, then apply the patch.
Hi Olle,
On Sun, Nov 3, 2013 at 9:56 AM, user-digest-h...@nutch.apache.org wrote:
user Digest 3 Nov 2013 09:56:44 - Issue 2096
Re: user Digest 30 Oct 2013 00:57:14 - Issue 2094
28926 by: Lewis John Mcgibbney
28929 by: Olle Romo
Thanks for the reply :)
I just
Hi Olle,
On Wed, Oct 30, 2013 at 12:57 AM, user-digest-h...@nutch.apache.org wrote:
NUTCH-828 fetch filter
28911 by: Olle Romo
Has anyone been able to make the nutch-828 patch fetch filter work with =
1.7?
Have you tried taking the patches and manually going through them
Hi Yasin Julien,
On Thu, Oct 24, 2013 at 8:06 PM, user-digest-h...@nutch.apache.org wrote:
add it to the metadata at the protocol level. Have you checked that there
isn't a patch for that in Jira? If not please create one
Yeah there is a patch for this which has stagnated somewhat.
You
Hi sujit,
On Tue, Oct 22, 2013 at 8:03 PM, user-digest-h...@nutch.apache.org wrote:
can't find Hadoop executable
28860 by: sujit rai
Can't find Hadoop executable. Add HADOOP_HOME/bin to the path or run in
local mode.
This is not a Nutch specific problem. There are numerous threads
Hi,
On Wed, Oct 16, 2013 at 9:50 AM, user-digest-h...@nutch.apache.org wrote:
Releases of Nutch 2 tend to follow releases of Gora as it relies on it an
awful lot.
We're working on the Avro upgrade in Gora and finally (after a lng time
away from the task at hand) I am finding time to
Hi Patrick,
On Sat, Sep 28, 2013 at 10:10 PM, user-digest-h...@nutch.apache.org wrote:
1. I use this command to start the crawling, as stated in the tutorial
/bin/bash ./bin/crawl urls/seed.txt TestCrawl http://localhost:8983/solr/2
So when the crawled pages will be sent to Solr for
Nice Julien.
Looking forward to seeing these talks online. Gutted I will not be there.
Best
Lewis
On Wed, Sep 25, 2013 at 12:52 AM, Julien Nioche
lists.digitalpeb...@gmail.com wrote:
Hi,
I will be giving a talk on Nutch at Lucene/SOLR Revolution in Dublin (4/7
Nov).
There should be quite
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer#Step_One:_Using_the_Mailing_Lists
On Thursday, August 29, 2013, Ralf R. Kotowski r...@enlle.com wrote:
Nutch 2.2.1 on Mysql
Got now a large number of docs, I interrupted Fetch several times with
[cntrl]+C and now I get this when running
which version of bitch do you use here kaveh? can you paste full stack
trace?
On Wednesday, August 28, 2013, kaveh minooie ka...@plutoz.com wrote:
I was wondering if this has happen to anyone else. every once in a while
my update map task fails with hadoop showing this message:
Too many
hi Brian
there's no doubt that we should add this to plugin central.
do you have an interest in doing so?
good to hear you got it working
lewis
On Monday, August 26, 2013, brian4 bqu...@gmail.com wrote:
Figured out the issue was I was not explicitly including the metadata
field
in the indexing
are attached. the
one that has all the stack trace in it is the one that actually finished
successfully.
On 08/29/2013 09:16 AM, Lewis John Mcgibbney wrote:
which version of bitch do you use here kaveh? can you paste full stack
trace?
On Wednesday, August 28, 2013, kaveh minooie ka
Hi Jonathan,
This has been a long outstanding issue IIRC.
I have not used Nutch for feed crawling for a while if I am honest, and I
honestly can't recall when and if I have done it with 2.x.
You will see [0], that by default the plugin is not actually initialized.
So for starters you should
ParseResult!
Thank you!
-- 原始邮件 --
发件人: lewis john mcgibbney [via Lucene]
ml-node+s472066n4087394...@n3.nabble.com;
发送时间: 2013年8月30日(星期五) 上午9:34
收件人: 基勇252637...@qq.com;
主题: Re: How nutch2.2 to parse rss?
Hi Jonathan,
This has been a long
ajax-solr
On Wednesday, August 28, 2013, Ralf R. Kotowski r...@enlle.com wrote:
So, basically Drupal beomes the Front-end? Interesting
-Original Message-
From: Nicholas Roberts [mailto:niccolo.robe...@gmail.com]
Sent: Thursday, August 29, 2013 1:55 AM
To: user@nutch.apache.org
to [Wrong
Password]
It seems as though the password is not going correctly to the proxy
server. I have set all required proxy parameters correctly in
nutch-site.xml.
Any clues?
Suresh.
-Original Message-
From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com]
Sent
Hi Brian,
When getting your metadata from the WebPage are you obtaining the fulle mat
e.g. page.getMetadata() or are you trying to get a Key and obtain the
ByteBuffer value? e.g. page.getFromMetadata(Utf8 Key)?
The latter will return null if nothing is present which is normal but means
that the
Hi Jeffery,
Sorry about length of time to respond.
Did you get a solution here?
I wonder if this has to do with your crawlId?
I would definately say that an indexing plugin is the way to go here. Put
the outlinks to avro map in WebPage then get them and add them to your doc.
On Thursday, August
I am sure that Renato (if he is watching) can plugin maybe as well.
We find in Gora that in every sense of the word, native Hadoop stores such
as Avro, HBase and Accumulo when we execute a query with GiraInputFormat
via getParitions we retrieve GoraInputSplits natively which means splits
are
I am sure that Renato (if he is watching) can plugin maybe as well.
We find in Gora that in every sense of the word, native Hadoop stores such
as Avro, HBase and Accumulo when we execute a query with GiraInputFormat
via getParitions we retrieve GoraInputSplits natively which means splits
are
Hi Tracy,
Logs are always your friend.
Take it step by step [0], look at your logs and read the web db after every
step to see whats going on.
hth
Lewis
[0]
http://wiki.apache.org/nutch/NutchTutorial#A3.2_Using_Individual_Commands_for_Whole-Web_Crawling
On Thu, Aug 22, 2013 at 1:44 PM, tracy
Hi Ward,
The main problem with using this set up seems to have been the
gora-sql-mapping.xml config file. The one which ships with Nutch was only a
guide and has been proven time after time to be unsuitable for many set ups.
This being said, it should be noted that the entire gora-sql module is
have
time to use tools, not time to contribute much. I was merely pointing out
the lack of documentation for Nutch v2.
On Tue, Aug 20, 2013 at 4:36 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Yes please Andrew.
If you can stick some time in to this then it would be greatly
Nice work.
The patch looks good and I would be +1 to getting it in to the codebase.
Thanks
Lewis
On Wednesday, August 21, 2013, kamaci furkankam...@gmail.com wrote:
Currently you can not see how many documents are added to Solr Server. One
could see how many documents are added to Solr server
John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Andrew,
It seems that from the core Nutch team the time and desire is not
there
right now to push forward with your proposals.
This DOES NOT mean that the proposals will be ignored.
I would FULLY back convenient packaging
of others' lives, a whole lot easier. I'll see what I can get
done during downtime, probably prioritizing Homebrew first.
On Fri, Aug 16, 2013 at 12:06 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Andrew,
Have you written and maintained Debian packages before?
I opened a Jira
Hi Andrew,
Have you written and maintained Debian packages before?
I opened a Jira issue for this a whiel back and as far as I know it is a
far from trivial process but I would be very very interested to get a
Debiran package for Nutch and look at a mac package if so required.
There is an issue of
http.content.length override?
I haven't checked your URL (although I do like your taste in music :) ) but
this is a possible source.
hth
Lewis
On Thursday, August 15, 2013, porcelet jeremy.ponce...@outlook.com wrote:
Hello, i'm trying to index all beatles tabs from www.ultimate-guitar.combut
Hi Brian,
I've never seen this before.
I found this however
http://web.archiveorange.com/archive/v/L9Ul807Yu77D5QW7PGPn
I know posting links to resolve problems is not ideal... bit as I said I've
never seen it before. Interesting thought that this happens intermittently
same as in the issue
Hi Kaveh,
No your not missing anything...
crawlID is not equal to the Cassandra keyspace (keyspaces by default set to
webpage for webdb and host for hostdb) instead the crawlId can be used to
generate, identify, maintain, etc. different datasets which can all belong
to the same keyspace.
If you
Hi Ralf,
AFAICS this would be much better suited to hbase user list.
Sorry I can't help more
On Friday, August 9, 2013, Ralf R. Kotowski r...@enlle.com wrote:
Nutch 2.2.1
Hbase 0.90.4
Solr 4.4.0
Fedora Core 19
Sun Java (latest)
Error Msg: Hbase is able to connect to Zookeeper but the
Hi Kaveh,
N.B. Taking this to user@gora and after this mail please drop user@nutch
Quick question, is your cassandra server up and running at default port
9160?
On Fri, Aug 9, 2013 at 3:36 PM, kaveh minooie ka...@plutoz.com wrote:
Hi Everyone
So I don't know if I am doing something wrong
baishen.li...@gmail.com wrote:
Is it possible to run a web server and connect to them that way? That was
what I ended up doing.
On Tue, Aug 6, 2013 at 4:58 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi,
Struggling with this one. And yes I acknowledge that it is not really
that way? That
was
what I ended up doing.
On Tue, Aug 6, 2013 at 4:58 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi,
Struggling with this one. And yes I acknowledge that it is not really
a
Nutch based question but hopefully someone can help...
I have a directory
Hi Tejas, Thanks this looks like the key ;)
On Tue, Aug 6, 2013 at 9:51 PM, Tejas Patil tejas.patil...@gmail.comwrote:
Hi Lewis,
Can you try the patch attached over here:
https://issues.apache.org/jira/browse/NUTCH-1483
Thanks,
Tejas
On Tue, Aug 6, 2013 at 7:24 PM, Lewis John Mcgibbney
PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi,
Now using Nutch trunk 1.8-SNAPSHOT HEAD
Back at this tonight. When attempting to fetch
file://home/law/Downloads/asf/solr-4.3.1/example/e001 (notice two
slashes)
which contains loads of HTML files, I get
Hi Rui,
Please open a Jira for this and patch up 2.3-SNAPSHOT if you are able.
You are right, it's probably about time to get ride of the class and entry
within the bin/nutch script... or at atleast to log a HUGE WARN message
when the class is invoked to say that it is deprecated and should not be
like to do that, but it seems I don't have permission to commit
the code. Could someone give me the access? Thanks.
Rui
At 2013-08-08 10:30:21,Lewis John Mcgibbney lewis.mcgibb...@gmail.com
wrote:
Hi Rui,
Please open a Jira for this and patch up 2.3-SNAPSHOT if you are able.
You
There is a benchmark class... whcih can be invoked from the nutch script I
think. We can maybe extend this for some provisional benchmarks and post
the stats on the Nutch wiki/site?
wdyt?
On Tuesday, August 6, 2013, Julien Nioche lists.digitalpeb...@gmail.com
wrote:
Hi Otis,
That certainly
remain in
their native path form.
On Tue, Aug 6, 2013 at 1:58 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi,
Struggling with this one. And yes I acknowledge that it is not really a
Nutch based question but hopefully someone can help...
I have a directory path as follows
Hi,
Now using Nutch trunk 1.8-SNAPSHOT HEAD
Back at this tonight. When attempting to fetch
file://home/law/Downloads/asf/solr-4.3.1/example/e001 (notice two slashes)
which contains loads of HTML files, I get the error as below.
Fetcher: throughput threshold retries: 5
-finishing thread
Thanks.
Great job.
On Wed, Jul 31, 2013 at 5:45 PM, claudiuchis claudiuchi...@gmail.comwrote:
Hi Lewis,
I've created patch NUTCH-1294-v3.patch.
Here are the steps I followed:
$ svn checkout http://svn.apache.org/repos/asf/nutch/tags/release-2.2.1
$ cd release-2.2.1
$ patch -p0
Hi Claudiu,
Can you please attach your new patch if possible to the issue and we can
try it out. I would be keen to get this in to the codebase.
Thank you very much for getting back here.
Best
Lewis
On Wed, Jul 31, 2013 at 2:42 PM, claudiuchis claudiuchi...@gmail.comwrote:
Hi Lewis,
The
Makes perfect sense.
I wonder if this is something to do with the Solr side?
Do you have some logs you can view?
On Tue, Jul 30, 2013 at 9:48 AM, dogrdon dgor...@planning.org wrote:
oh, sorry, I just obscured it for the purposes of posting because I did not
want to publish our solr
Great.
On Tue, Jul 30, 2013 at 9:16 AM, Weder Carlos Vieira weder.vie...@gmail.com
wrote:
Hello Lewis,
I changed the ivy.xml GORA 0.3 to 0.2.1 version.
Weder
On Tue, Jul 30, 2013 at 1:09 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi,
AFAIK, MySQL
https://issues.apache.org/jira/browse/NUTCH-1294
Would be really really great if you could try this out and comment on this
issue.
Another tool we would then need to port to pluggable indexing.
hth
Lewis
On Tue, Jul 30, 2013 at 11:10 AM, claudiuchis claudiuchi...@gmail.comwrote:
Hi folks,
I
I did not port I merely helped a bit :)
Dan Rosher was the driving force behind this one!
Thanks for any feedback.
Best
On Tue, Jul 30, 2013 at 11:23 AM, claudiuchis claudiuchi...@gmail.comwrote:
Hi Lewis.
Thank you for porting SolrClean to the 2.x branch.
I'll apply the patch and let you
Hi,
On Tue, Jul 30, 2013 at 3:29 PM, claudiuchis claudiuchi...@gmail.comwrote:
...
snip
...
4. I applied the patch
cd /usr/local/nutch-2.2.1
patch -p0 NUTCH-1294-v2.patch
The patch didn't update src/bin/nutch and conf/log4j.properties for
some
reason. I've updated these manually.
If
pandey devangpande...@gmail.comwrote:
@lewis ... Thanx for replying . Thing is using readdb I can read my crawldb
but it shows only fetcher time . How can i find Fetcher start and end time
as suggested by you..
On Sat, Jul 27, 2013 at 6:05 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com
Hi,
On Sun, Jul 28, 2013 at 2:38 PM, dogrdon dgor...@planning.org wrote:
2013-07-26 16:55:31,593 INFO solr.SolrDeleteDuplicates -
SolrDeleteDuplicates: Solr url: http://domain:port/solr/core0/
...
Caused by: java.lang.NullPointerException at
This Solr URL looks incorrect.
Please look at recent list archives.
SqLStore is deprecated.
Thanks
Lewis
On Friday, July 26, 2013, EarthMan huangrong...@gmail.com wrote:
Hello Weder,
Have you solved this problem with nutch 2.2?
If yes can you share the solution? thank you.
I get the same error below:
Exception in thread
by looking at the fetcher start time and finish time Sir.
There should be mothods in TimingUtil to help you.
hth
On Friday, July 26, 2013, devang pandey devangpande...@gmail.com wrote:
Hello , I am working on nutch 1.4 to crawl certain domains . Now after
successful crawling I want to get start
Hi Brian,
Gora =0.3 deprecates the gora-sql 0.1.1-incubating artifact.
This means Nutch 2.2.1 and MySQL/HSQLDB are incompatible.
Lewis
On Wed, Jul 24, 2013 at 12:42 PM, brian4 bqu...@gmail.com wrote:
It definitely has nothing to do with HBase - I switched to use MySQL and I
am
still having
Hi,
On Wed, Jul 24, 2013 at 2:02 PM, band_master swirlanalyt...@gmail.comwrote:
After reading up a bit more, I see the 'crawl'
function is deprecated in Nutch2 in favor of a java file located in
'bin/crawl' that executes each command in sequence.
It is a replacement script which chains the
Hi band_master,
On Tue, Jul 23, 2013 at 1:20 PM, band_master swirlanalyt...@gmail.comwrote:
I am having trouble, though, getting Nutch to work. I can successfully
inject urls, but there seems to be an error in the Hadoop log around
parsing
UTF8 characters.
How are you coming to this
Hi Alex,
About now is a good time to read how Nutch deals with classloading.
Navigate to plugin central on the wiki and you will see the documentation.
hth you out
Lewis
On Tuesday, July 23, 2013, AC Nutch acnu...@gmail.com wrote:
Hi All,
I'm attempting to build a Nutch plugin on Nutch 1.7
added
to the plugin class-loader. However, that doesn't appear to be the case -
I
must be missing something, but I'm not sure what that is...?
Alex
On Tue, Jul 23, 2013 at 11:30 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
Hi Alex,
About now is a good time to read how
On Sat, Jul 20, 2013 at 10:58 PM, Rui Gao gaorui...@163.com wrote:
I checked the DB, the URL is already in DB.
The plugin property is configured like this:
property
nameplugin.folders/name
value./src/plugin,./plugins/value
Any reason at all that you have two directories listed? Do you
Hi Rui,
You can't use this version of HBase in your search stack with Nutch 2.x
right now.
You need to downgrade quitte significantly to 0.90.x
Thanks
Lewis
On Saturday, July 20, 2013, Rui Gao gaorui...@163.com wrote:
Hi,
I try to setup Nutch2.2.1 + hbase-0.94.9 + eclipse + cygwin on Windows
Hi Rui,
On Saturday, July 20, 2013, Rui Gao gaorui...@163.com wrote:
So, what direction will Nutch go? Will it co-operate with relationship
database or will it only work on non-relationship database like hbase?
This has nothing to do with Nutch. It has everything to do with Apache Gora
and we
Hi Martin,
On Saturday, July 20, 2013, Martin Aesch martin.ae...@googlemail.com
wrote:
I have about 25K URLs per map task and around 8M URLs total
All 6 mappers run and have continuously output. The aggregated parse
rate is 100URLs/sec.
wow this is painstakingly slow indeed. This was similar
, there's a link
talking about how to integrate nutch with mysql:
http://nlp.solutions.asia/?p=362
Do you have any suggestion?
Thanks.
Best Regards,
Rui
At 2013-07-11 03:53:12,Lewis John Mcgibbney lewis.mcgibb...@gmail.com
wrote:
Hi Rui,
This should not work.
The SqlStore module and support
, there's a link
talking about how to integrate nutch with mysql:
http://nlp.solutions.asia/?p=362
Do you have any suggestion?
Thanks.
Best Regards,
Rui
At 2013-07-11 03:53:12,Lewis John Mcgibbney lewis.mcgibb...@gmail.com
wrote:
Hi Rui,
This should not work.
The SqlStore module and support
John Mcgibbney lewis.mcgibb...@gmail.com
wrote:
Please read the exception trace. You are running on Hadoop? You need to
ensure that your plugins.directory points to the right path. There is also
a mention of a missing job file. Please ensure that your nutch job file is
on the Hadoop jobtracker
Hi Brian,
On Thursday, July 18, 2013, brian4 bqu...@gmail.com wrote:
On one machine, nutch just suddenly started freezing during the generator
job.
Are these continuous crawls? What values d
you have set for generate.max.count? I ask as calls must be made to the
backed the determine a limit for
Hi,
On Fri, Jul 19, 2013 at 9:43 AM, dogrdon dgor...@planning.org wrote:
+^http://www.oursite.com/([a-z0-9\-A-Z]*\/)* in the regex-urlfilter.txt
file
should be
-^http://www.oursite.com/([a-z0-9\-A-Z]*\/)*
Thanks
Lewis
Hi Brian,
On Friday, July 19, 2013, brian4 bqu...@gmail.com wrote:
No not continuous or large-scale. Crawls are just run each day. The
machine that has the freezing issue was the one I was planning to use to
do
the daily crawls.
Think this is most certainly a local config bottleneck. Nutch
Sorry I misunderstood completely.
You can enable filtering (and normalizing) for the solr-indexer job in trunk
http://wiki.apache.org/nutch/bin/nutch%20solrindex
This will enable you to crawl everything but restrict what gets sent down
to the index from your crawdl.
hth
Lewis
On Friday, July
Hi,
People are using both. People are finding bugs, improving code and making
better software with every release we push. It is fair to say that 1.x is a
mature and production ready piece of software. There is absolutely no doubt
about this.
2.x has come a long way over the last few releases.
Hi Martin,
Havve you checked that all mappers are working while parsing job is running?
How many URLs are you trying to parse here?
On Friday, July 19, 2013, Martin Aesch martin.ae...@googlemail.com wrote:
Dear nutchers,
Having Nutch 2.2.1/HBase 0.90.6/Hadoop 1.1.2/6Mappers/6Reducers/Core
I am not looking at the code.
Can you explain what you're expecting to happen please?
On Thursday, July 18, 2013, vivekvl vive...@yahoo.com wrote:
I found a issue in shouldFetch() of AbstractFetchSchedule class. (Nutch
2.1)
Here even when (fetchTime - curTime maxInterval * 1000L), the method
Hi Tony,
On Thursday, July 18, 2013, Tony Mullins tonymullins...@gmail.com wrote:
Currently in Nutch2.x SolrDeDup job runs on entire index.
Is it possible to configure it to run against the current batch Id ?
It will be possible. There are various issues open (and patches) for 2.3
which deal
Hi,
Please grab the most recent Nutch 2.2.1 release from our downloads page.
A description of the codebase is available on the Nutch home page.
You can use this with different NoSQL backends. Tutorials are available on
the Nutch wiki.
hth
Lewis
On Fri, Jul 12, 2013 at 4:02 AM, devang pandey
Hi Brian,
On Fri, Jul 12, 2013 at 11:57 AM, brian4 bqu...@gmail.com wrote:
What am I doing wrong?
You're doing nothing wrong. We would need to submit a patch for this to
get it working.
Currently, when the plugins are compiled and tested, I *think* that the
generated plugin test
Hi,
On Friday, July 12, 2013, Yves S. Garret yoursurrogate...@gmail.com wrote:
1 - Is there a web-gui interface that will enable me to look over the
different search terms that I can
use and what searches are going on? If so, how can I solve this problem?
Nutch 2.x has a REST interface via
The gora-sql artifact is now deprecated. Please read your ivy.xml
descriptor for reasoning and logic.
We advise you to use another storage mechanism... the options are also in
the ivy.xml descriptor.
hth
Lewis
On Thursday, July 11, 2013, Ramakrishna ramakrishna...@dioxe.com wrote:
When i use
Please check the syntax you are using for the cli arguments. It is all
wrong.
You can see correct usage syntax on nutch tutorial or on command line
options.
hth
Lewis
On Friday, July 12, 2013, Ramakrishna ramakrishna...@dioxe.com wrote:
Injector: starting at 2013-07-12 18:17:41
Injector:
No 2.2.1 webpage schema will be the same.
Nutch 2.1 introduced the concept of batchId for URLs but there is no change
in most recent.
Thanks
Lewis
On Tuesday, July 9, 2013, A Laxmi a.lakshmi...@gmail.com wrote:
Hello,
I could use Nutch 1.6 and 2.1 without any issues in the past. However, now
Can you show the relevant part of the segment dump?
On Wed, Jul 10, 2013 at 4:10 AM, devang pandey devangpande...@gmail.comwrote:
hello I am using readseg command to read a segment corresponding to a
particular url .
But output contains nutch status 67 . What exactly is nutch status 67
Hi,
On Tue, Jul 9, 2013 at 1:03 AM, imran khan imrankhan.x...@gmail.com wrote:
I have gone through the source code of this plugin but couldn't find any
code which could be affect the value of boost field.
Assuming that you are using 2.2.1 or 2.X HEAD, the boost field as assigned
to the
Hi Rui,
This should not work.
The SqlStore module and support for it is now deprecated within Apache Gora.
If you would like to downgrade to use Nutch 2.1, then you can use older
Gora artifacts but this is not recommended.
Thanks
Lewis
On Sun, Jul 7, 2013 at 12:36 AM, Rui Gao gaorui...@163.com
Please look for mapred-site.xml in hadoop conf directory. you can specify
mapred.reduce.tasks and set an int for this value
You will need to restart the jobtracker for this to kickin I would imagine.
On Wednesday, July 3, 2013, Sznajder ForMailingList
bs4mailingl...@gmail.com wrote:
Hi
When
in mapred-site.xml
It is your Mapreduce configuration override.
hth
On Tuesday, July 2, 2013, Sznajder ForMailingList bs4mailingl...@gmail.com
wrote:
Thanks a lot Markus!
Where do we define this parameter, please?
Benjamin
On Tue, Jul 2, 2013 at 4:28 PM, Markus Jelsma
in mapred-site.xml
It is your Mapreduce configuration override.
hth
On Tuesday, July 2, 2013, Sznajder ForMailingList bs4mailingl...@gmail.com
wrote:
Thanks a lot Markus!
Where do we define this parameter, please?
Benjamin
On Tue, Jul 2, 2013 at 4:28 PM, Markus Jelsma
in mapred-site.xml
It is your Mapreduce configuration override.
hth
On Tuesday, July 2, 2013, Sznajder ForMailingList bs4mailingl...@gmail.com
wrote:
Thanks a lot Markus!
Where do we define this parameter, please?
Benjamin
On Tue, Jul 2, 2013 at 4:28 PM, Markus Jelsma
Which version of Nutch are you using please?
On Tuesday, July 2, 2013, Christian Nölle noe...@uni-wuppertal.de wrote:
Hi everbody,
I got a problem concering solrdedup. We got a field digest in solr,
solrindex-mapping for digest is fine as well, but there is no field
digest showing up in the
directory?
Benjamin
On Tue, Jul 2, 2013 at 5:10 PM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
in mapred-site.xml
It is your Mapreduce configuration override.
hth
On Tuesday, July 2, 2013, Sznajder ForMailingList
bs4mailingl...@gmail.com
wrote:
Thanks a lot Markus!
Where do
Renato Marroquín Mogrovejo (nb)
Sebastian Nagel
Julien Nioche
Lewis John McGibbney
[ ] +/-0, fine, but consider to fix few issues before...
[ ] -1, nope, because... (and please explain why)
WOW Great VOTEíng turn out. Thank you so much to everyone for reviewing
this RC. I will progress
Sorry team, this should have been a [RESULT] thread.
Thanks
Lewis
On Tue, Jul 2, 2013 at 9:08 AM, Lewis John Mcgibbney
lewis.mcgibb...@gmail.com wrote:
In the famous words of Truman good morning, good afternoon, good evening
and good night... to all Nutch'ers!!!
I would like to bring
Good Afternoon Everyone,
The Apache Nutch PMC are very pleased to announce the immediate release of
Apache Nutch v2.2.1, we advise all current users and developers of the 2.X
series to upgrade to this release ASAP.
Apache Nutch is an open source web-search software project. Stemming
from Apache
Hi,
Please try
*http://s.apache.org/mo*
Specifically the generate.max.count property.
Many many URLs are unfetched here... look into the logs and see what is
going on. This is really quite bad and there is most likely one/a small
number of reasons which ultimately determine why so many URLs are
Hi,
On Tue, Jul 2, 2013 at 3:53 PM, h b hb6...@gmail.com wrote:
So, I tried this with the generate.max.count property set to 5000, rebuild
ant; ant jar; ant job and reran fetch.
It still appears the same, first 79 reducers zip through and the last one
is crawling, literally...
Sorry I
Hi Avilash,
It is extremely difficult to comment here.
We need information on whats actually happening. Your description is a bit
of a black box. Can you please look in hadoop.log and solr logs as well.
THIS WIll give you an indication of how many documents are/were written
down to Solr.
thank you
Is there a temporary file within the urls directory.
something like seed.txt~ ?
On Monday, July 1, 2013, h b hb6...@gmail.com wrote:
Hi,
I started to inspect the content of the crawled html.
I have 2 urls in my seed.txt. So I should just have 2 documents in my solr
response, right? I dropped
Yes its as simple as that. The JobTracker takes care of delegation of
tasks, therefore there is no need for Nutch to be present on every node.
Hadoop and HBase (or whichever back you choose) is a different case.
On Sunday, June 30, 2013, Tejas Patil tejas.patil...@gmail.com wrote:
I have never
401 - 500 of 1408 matches
Mail list logo