Hi,
I've written my own plugin that's doing some custom parsing.
I've needed language parsing in that plugin and the language-identifier
plugin is wokring great for my needs.
However, I can't use the language identifier plugin as it is, since I want
to parse only a small portion of the webpage.
Zanzico Gioele
Senior Web Analyst
VitecGroup - Division / Unit
Tel +39 0424 07
Fax +39 0424 808999
www.vitecgroup.it http://www.vitecgroup.it/
P Respect the environment: don't print this e-mail, if not necessary.
--
This message has been scanned for viruses and
dangerous
Il giorno lun, 02/11/2009 alle 09.48 +0100, Zanzico Gioele ha scritto:
Zanzico Gioele
Senior Web Analyst
VitecGroup - Division / Unit
Tel +39 0424 07
Fax +39 0424 808999
www.vitecgroup.it http://www.vitecgroup.it/
P Respect the environment: don't print this e-mail, if not
Hello,
there is no Administrator. But you can do the unsubscribe your-self. On
the Nutch Maling-List information site
http://lucene.apache.org/nutch/mailing_lists.html
you can find the following E-Mail address:
nutch-user-unsubscr...@lucene.apache.org
Then your unsubscribe requests should
Kalaimathan Mahenthiran wrote:
I forgot to add the detail...
The segment i'm trying to do updatedb on has 1.3 millions urls fetched
and 1.08 million urls parsed..
Any help related to this would be appreciated...
On Sun, Nov 1, 2009 at 11:53 PM, Kalaimathan Mahenthiran
matha...@gmail.com
Eran Zinman wrote:
Hi,
I've written my own plugin that's doing some custom parsing.
I've needed language parsing in that plugin and the language-identifier
plugin is wokring great for my needs.
However, I can't use the language identifier plugin as it is, since I want
to parse only a small
Il giorno lun, 02/11/2009 alle 10.04 +0100, Heiko Dietze ha scritto:
Hello,
there is no Administrator. But you can do the unsubscribe your-self. On
the Nutch Maling-List information site
http://lucene.apache.org/nutch/mailing_lists.html
you can find the following E-Mail address:
Hi Andrzej,
thank you so much! that worked like a charm!
I've spent so much time trying to figure this out and you helped me solve it
in 5 min!
Thanks!
Eran
On Mon, Nov 2, 2009 at 11:13 AM, Andrzej Bialecki a...@getopt.org wrote:
Eran Zinman wrote:
Hi,
I've written my own plugin that's
Nico Sabbi wrote:
Il giorno lun, 02/11/2009 alle 10.04 +0100, Heiko Dietze ha scritto:
Hello,
there is no Administrator. But you can do the unsubscribe your-self. On
the Nutch Maling-List information site
http://lucene.apache.org/nutch/mailing_lists.html
you can find the following E-Mail
Il giorno lun, 02/11/2009 alle 10.47 +0100, Andrzej Bialecki ha scritto:
Nico Sabbi wrote:
Il giorno lun, 02/11/2009 alle 10.04 +0100, Heiko Dietze ha scritto:
Hello,
there is no Administrator. But you can do the unsubscribe your-self. On
the Nutch Maling-List information site
Andrzej Bialecki wrote:
doesn't work, as reported by me and others last week.
Thanks,
Did you get the message with the subject of confirm unsubscribe from
nutch-user@lucene.apache.org and did you respond to it from the same
email account that you were subscribed from?
.. I just verified
Hi thanks for responding.
Tomcat is working just like it should so it the crawl.
I am not connecting to my search DB to seems to be the problem.
Cause i get 0 results out of 0 and that is impossible cause there should
be crawl data.
Was thinking i did something wrong with the nutch-site.xml
Il giorno lun, 02/11/2009 alle 11.00 +0100, Andrzej Bialecki ha scritto:
Andrzej Bialecki wrote:
doesn't work, as reported by me and others last week.
Thanks,
Did you get the message with the subject of confirm unsubscribe from
nutch-user@lucene.apache.org and did you respond to it
Thanks for all the replies...
Okay, I think there seems to be some issue too...
I'm running nutch out of the box.. using nutch release 1.0... I
running this in local mode..
The number of reduce tasks.. is the default configured by nutch...
The db size is approximately 860 mb..
i know the
I'm very new at this, so forgive my novice questions. I'm trying to
install nutch in WebSphere 6.1. While I can see that others have done this
before, I've been unsuccessful. I keep getting this error:
Error 500: java.lang.Error: java.lang.NoClassDefFoundError:
org.apache.jsp._search (wrong
not having received a response from mailman I can't proceed to step 3
Have you checked the junk mail filters and stuff like that? Perhaps
the message is getting deleted/removed/hidden before you get it...
Hi again
i know the process is not stuck.. and the process is running because i
turned on the hadoop logs and i can see logs being written to it...
I'm not sure how to check if the task is completely stuck or not...
run jps to identify the process id then *jstack id* several times to see if
I have lot of space left on the /tmp . I don't have separate partition
for /tmp... i have a folder called /tmp... There is lot of space
left.. close to 1.3Terabytes...
1.4T 55G 1.3T 5% /
tmpfs 3.8G 0 3.8G 0% /lib/init/rw
varrun3.8G
Why is nutch writing /tmp/hadoop-[userid] files, and how can I stop it
doing that?
--
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin
Hi,
I got following exception in my datanode log file while udating db.
2009-11-03 04:39:24,273 ERROR datanode.DataNode -
DatanodeRegistration(192.168.101.152:50010,
storageID=DS-1706374374-192.168.101.152-50010-1255721446274,
infoPort=50075, ipcPort=50020):DataXceiver
java.io.EOFException:
Hello everyone,
Here is more info about the exception.
On both slave node i got exception.
2009-11-03 04:39:24,273 ERROR datanode.DataNode -
DatanodeRegistration(192.168.101.152:50010,
storageID=DS-1706374374-192.168.101.152-50010-1255721446274,
infoPort=50075, ipcPort=50020):DataXceiver
Hi,
Can anyone please let me know how to make nutch crawl within a sub category
of a URL?
For example, if I want to crawl within Computers Internet category of
answers.yahoo.com. How do I do it with Nutch?
URL:
Hi,
Can anyone please let me know how to make nutch crawl within a sub category
of a URL?
For example, if I want to crawl within Computers Internet category of
answers.yahoo.com. How do I do it with Nutch?
URL:
23 matches
Mail list logo