Hi,
I can only confirm that with Indigo the command line formatter fails on source
files using generics.
But when launched from the GUI it works. I imported the eclipse-codeformat.xml (Properties Java
Code Style Formatter) and run it on the project node (Source Format). 292 files have been
at 12:05 AM, Sebastian Nagel
wastl.na...@googlemail.com wrote:
Hi,
I can only confirm that with Indigo the command line formatter fails on
source files using generics.
But when launched from the GUI it works. I imported the
eclipse-codeformat.xml (Properties Java Code Style Formatter
Hi Lewis,
We'll try to get this sorted out in due course
that would be great!
maybe you find it on Google images?
I tried hard, but I didn't find it.
Thanks, Sebastian
On 11/18/2011 05:52 PM, Lewis John Mcgibbney wrote:
Hi Sebastian,
This has happened recently with quite a lot of
Hi,
images and attachments of wiki pages are again viewable.
Thanks!
But I found that pages are now immutable (at least for me).
According to http://wiki.apache.org/nutch/ContributorsGroup
I would like to get permission to edit wiki pages.
My wiki user name: SebastianNagel
Bye,
Sebastian
The tutorial will need to be updated to reflect this change. You are
volunteering?
ok
On 05/24/2012 09:27 AM, Julien Nioche wrote:
Hi Seb
Moved to dev@ as more relevant
[...]
- bin should only have the content of runtime/local/
What about runtime/deploy/, esp. nutch-1.5.job ?
Does it mean
and
smaller improvements on the 1.x branch, and some documentation.
Cheers,
Sebastian
On 05/25/2012 05:56 PM, Julien Nioche wrote:
Dear all,
It is my pleasure to announce that Sebastian Nagel has joined the Nutch PMC
and is a new committer. Sebastian, would you mind telling us about
yourself
Hi Lewis,
Minor nitpick : the directory /runtime is not necessary as it is built with
ANT. Removing it would massively reduce the size of the archive.
this applies also to the docs/ folder (15MB uncompressed)
for the bin package:
+1
Sebastian
...
Thanks
Lewis
On Thu, May 31, 2012 at 9:29 PM, Sebastian Nagel
wastl.na...@googlemail.com wrote:
Hi Lewis,
Minor nitpick : the directory /runtime is not necessary as it is built with
ANT. Removing it would massively reduce the size of the archive.
this applies also to the docs
+1
Sebastian
On 05/31/2012 10:37 PM, Lewis John Mcgibbney wrote:
Good Evening Everyone,
A candidate for the Apache Nutch 1.5 RC4 is available at:
http://people.apache.org/~lewismc/apache-nutch-1.5-rc4/
The release candidate is a src.zip, bin.zip, src.tar.gz and bin.tar.gz
archive of
Hi,
bin/nutch (help/overview) still contains the option -core
which is currently (trunk) not functional:
% bin/nutch -core parsechecker
Unrecognized option: -core
Could not create the Java virtual machine.
It's broken since https://issues.apache.org/jira/browse/NUTCH-843
But is it still
Hi Lewis,
my first steps with 2.0 (to be continued, still struggling).
Two points (I'll try to give a final vote tomorrow):
1 some guidance would be nice. README.txt points
to http://wiki.apache.org/nutch/NutchTutorial which refers to 1.x
(I'm using
Hi Lewis,
Please see http://wiki.apache.org/nutch/Nutch2Tutorial which is an
update of Julien's (I think) page on GORA_HBase. Thsi will get you
rocking with HBase. The changes between Cassandra, Accumulo and the
other data stores are fairly trivial.
I'll managed to perform a crawl with 2.0
We only supply src distributions...
Does this principle apply to Nutch 2 as well?
Maybe, yes.
The situation with the current binary package is uncomfortable:
I had to copy/link gora-hbase and hbase jars into lib/ to get nutch running.
2012/6/13 Lewis John Mcgibbney lewis.mcgibb...@gmail.com
+1
with a documentation issue about the dependencies:
simply copy its HBase core lib from the HBase installation into the
local/lib directory. This works for me.
Removing lib/hbase-0.90.4.jar and copying hbase-0.94.jar from the HBase
installation
into lib/ caused a
Exception in thread main
Plugins are registered multiple times in
build.xml
src/plugins/build.xml
default.properties
This is error-prone and there are already some inconsistencies (trunk):
build.xml:
lib-http (given twice in target release)
urlfilter-prefix (given twice in target release)
default.properties:
+1
Looks perfect and runs.
Great work, Lewis!
On 07/07/2012 11:07 PM, Lewis John Mcgibbney wrote:
*PING*
Hi Everyone,
I know there have been a good few threads going around with a power of
release candidates but I wonder if it is possible to get some feedback
on the 1.5.1RC#3 below.
Hi,
I just discovered that some jar files
in the bin package (1.5.1) and also in nutch.job
are packed twice:
2 commons-logging-1.1.1.jar lib parse-tika
2 geronimo-stax-api_1.0_spec-1.0.1.jar lib parse-tika
2 tagsoup-1.2.1.jar parse-html parse-tika
2
Great.
On 09/18/2012 10:57 PM, Lewis John Mcgibbney wrote:
Hi Seb,
I totally forgot about this. I will forward port to 2.1 branch before
pushing the release.
Thanks
Lewis.
On Tue, Sep 18, 2012 at 9:52 PM, sna...@apache.org wrote:
Author: snagel
Date: Tue Sep 18 20:52:08 2012
New
+1
* package looks good
* sample crawl runs like a charm
On 09/21/2012 05:07 PM, Lewis John Mcgibbney wrote:
Hi Everyone,
A candidate for Apache Nutch 2.1 is available at:
http://people.apache.org/~lewismc/apache-nutch-2.1
The release candidate is a src.zip and src.tar.gz ONLY
archive
Forgot to say:
I've run the test crawl with HBase 0.90.5
On 10/01/2012 04:34 PM, Julien Nioche wrote:
Would be good to get thumb-ups from people who've tested crawls on other
backends (Cassandra, Hbase) before pushing the release. I can't really
give a +1 as I've just checked the most obvious
Hi Cesare,
hmhh... Good catch!
The modifiedTime is also set in CrawlDbReducer.reduce
right after FetchSchedule.setFetchSchedule is called and the signature
hasn't changed compared to the previous fetch, cf. NUTCH-1341.
At a first glance, it looks like the modifiedTime is indeed never set
with
+1 to release
Now we can hold the 6-month cycle.
Chris is right: If we manage to address a couple
of the critical issues early next year, we can release earlier.
Sebastian
On 11/22/2012 06:43 PM, Mattmann, Chris A (388J) wrote:
Release early, release often :)
I'd say I'd be happy to try and
+1
- source package builds, tests pass
- successful test crawl with bin package (20+ URLs, Linux, local mode, Solr 3.6)
On 11/23/2012 03:24 PM, lewis john mcgibbney wrote:
Hi Everyone,
A candidate for the Apache Nutch 1.6 RC#1 is available at:
Hi Markus,
this would mean that urlfilter and urlnormalizer plugins are accessed from
parse plugins.
At a first glance, sounds somewhat oddish. But it's already the case for the
feed parser.
We would have to do it for all parse plugins. Since there not so many that's no
argument against.
Hi Markus,
we should be fine right?
Yes, even better: FeedParser only contains URLNormalizers and URLFilters
objects which get the
references to plugin instances themselves via ObjectCache in the constructor.
Btw., that's also the way the parse filter plugins are referenced,
eg. TikaParser -
Hi Lewis,
+1
it's time: May for 2.2 and beginning of June for 1.7 to adhere to the 6-month
release cycle.
After sorting major/critical issues for 1.7 with patches available, I've found:
NUTCH-1245
NUTCH-1342
NUTCH-1430
NUTCH-926
NUTCH-1334
NUTCH-1467
which are worth to commit. I'll
Does the property plugin.includes include urlfilter-prefix?
Default is only urlfilter-regex.
On 04/22/2013 06:28 PM, naveen shukla wrote:
Hi All,
I got run time exception when i run following command
*nutch org.apache.nutch.net.URLFilterChecker -filterName
Now, in order to get or save the files in their actual format, in your
case, .flv or .epub files, you will have to write additional program (for
example in Java).
No, you don't have to: the plugin parse-tika can parse .epub and .flv
- see http://tika.apache.org/1.2/formats.html
- test it, eg:
Hi,
please take care not to remove the fix version
when applying bulk changes, e.g., 2.2 = 2.3
Alternative fix versions (1.7) are not kept.
Luckily Jira is quite powerful, I restored the 1.x
fix version using this awful filter:
project = NUTCH AND fixVersion in (2.3)
AND status = Open AND
+1 (test with hbase)
On 06/01/2013 01:17 AM, lewis john mcgibbney wrote:
Good Friday Everyone,
Glad to get to a stage where we can VOTE on the release of the Apache Nutch
2.2 artifacts.
We solved a stack of issues:
http://s.apache.org/LPB
SVN source tag:
+1 go ahead!
Sebastian
On 06/08/2013 11:53 PM, Lewis John Mcgibbney wrote:
Thread says it all troops.
Best
Lewis
Hi Tejas,
you should be able to add images as Attachments:
there is a tab/link left of More Actions:.
Cheers,
Sebastian
On 06/11/2013 01:30 AM, Tejas Patil wrote:
Hi @nutch-dev,
I want to put out this [0] tutorial over Nutch wiki.
1. Do you see anything wrong in it or any improvements ?
Hi Richard,
if understood right parse-tika does the job well?
Extract content + all links including anchor texts?
1. The plugin parse-tika seems indeed better maintained
than feed.
2. plugin feed is special as it treats a rss file as
- one master document (the rss file)
- many sub-documents
Hi Carmen,
I was wondering whether the message I sent a little while ago has been seen?
Yes, looks like. Sorry.
The user
CarmenKlaussner
has been added to contributors group. Following the common practice I
left no space in the user name. Is that ok?
Cheers,
Sebastian
On 08/29/2013 11:15
Hello Ivan,
Where are the logs? I suppose to see them on the console output thile
running the hadoop jar nutch.job. Maybe that code is executing on the
DataNode??
Yes, these logs should be on the nodes where the tasks have been run.
Search for hadoop log location, the answer may depend on the
Hi,
recently I got some IO exceptions when reading older segments
with recent trunk builds. Did anyone make similar observations?
According to the stack it seems possible that NUTCH-1622
causes segments' parse_data to be incompatible between versions?
Thanks,
Sebastian
java.io.IOException: IO
Thanks, good to know.
We should add a warning to release notes and CHANGES.txt.
Ideally, of course, reading segments should be backward compatible.
Sebastian
2013/9/17 Markus Jelsma markus.jel...@openindex.io
Yes, we've got trouble with it too, similar exception, but we're did not
sync our
Hi,
links from http://nutch.apache.org/ to nightly API Docs
https://builds.apache.org/job/Nutch-trunk/javadoc/
https://builds.apache.org/job/Nutch-nutchgora/javadoc/
are broken.
Is it still generated?
Does anyone know how to fix it?
Thanks,
Sebastian
Hi,
+1 to release soon (this year, or early next year)
and probably a few others but they could also be done later.
At least, these should be done before releasing:
NUTCH-1646 IndexerMapReduce to consider DB status
NUTCH-1413 Record response time
Sebastian
On 11/28/2013 05:49 PM, Julien
lists.digitalpeb...@gmail.com
mailto:lists.digitalpeb...@gmail.com wrote:
Hi guys,
At least 2 of the issues that Seb and I had mentioned have now been
committed. What about releasing 1.8 from trunk? If so, any volunteers?
Julien
On 2 December 2013 21:02, Sebastian Nagel wastl.na
Hi Alparslan,
You can see the stats in this link:
https://developers.google.com/webmasters/state-of-the-web/) We
can develop an HTML parser plug-in to provide such an improvement.
Nice resource and nice idea.
For me that sounds like a combination of the ParserJob and the classic Hadoop
word
Hi everyone,
NUTCH-1113 and NUTCH-1706 are fixed,
broken HostDb (NUTCH-1325) has been removed for now from trunk.
No open issues marked for 1.8 are left
and everything seems to work!
Time to spin a new release candidate?
Sebastian
Hi Greg,
I am wondering if it would it be possible to integrate this kind of change
in the upstream code base?
Yes, of course. Please, open an issue in Jira. Ideally, with a patch attached,
see:
http://wiki.apache.org/nutch/Becoming_A_Nutch_Developer#Step_Three:_Using_the_JIRA_and_Developing
Hi Yann,
In Parse type, we don't have getData() so we can't add new metadata.
...
So what is the new way to add custom field to index ? Maybe i miss
something ...
In 2.x data for custom fields can be added to the WebPage's metadata
in ParseFilter via
page.putToMetadata(Utf8 key, ByteBuffer
Hi,
does it mean you are (also) addressing NUTCH-1086? Would be great,
since this issue is waiting for a solution since long!
The reason I picked version 4.1.1 and not the latest is because I noticed
it is already in the build/lib dir and I wasn't sure I can use two versions
of the jar with
Hi,
Url validator plugin reject this kind of url because of .. .
I had a look RFC 2396 and w3c standarts. There is no constraint
about .. except these /../ and /.. kind of statements.
Also Unix systems accept files containing two dots abc..xyz.txt.
urlfilter-validator should be relaxed to
eclipse: change ivy.xml, close the Eclipse project,
call ant eclipse, open project again and press F5 Refresh
Sebastian
On 04/04/2014 10:56 PM, d_k wrote:
On Fri, Apr 4, 2014 at 11:28 PM, Sebastian Nagel wastl.na...@googlemail.com
wrote:
Hi,
does it mean you are (also) addressing NUTCH
Hi Diaa,
on Windows Cygwin is required as prerequisite to run Nutch
(or other Hadoop-based applications). Cygwin will provide
the program chmod. See:
http://wiki.apache.org/nutch/GettingNutchRunningWithWindows
Sebastian
On 04/23/2014 04:57 PM, Diaa Abdallah wrote:
Hi,
Is there a way to debug
Hi Diaa,
Why doesn't nutch assume that web links that have www. at the beginning are
of the http protocol?
It would be not a big problem to do so. The url normalizer provides scopes
(inject, fetch, etc.): you only have to point the property
urlnormalizer.regex.file.inject to a special
Hi Talat,
At the present our architecture of scoring plugins don't permit.
The scoring plugin interface fits into the crawler work and data flow:
links are feed into CrawlDb/Webtable, fetch lists are generated, etc.
OPIC can be used because it's online. Other link rank algorithms
define a
Hi Talat,
parse-html uses neko per default, or as alternative tagsoup.
Tagsoup is also used by parse-tika. Which parser lib is used
internally by parse-html can be set via property parser.html.impl.
It will not harm to have more libs available (if they are
compatible, also regarding license). If
Hi Talat,
thanks for the examples. I've also observed that Neko has some problems
even with valid HTML5. Luckily, most pages do not use excessively the
syntactic freedom HTML5 allows (not closing tags, leaving implicit tags
away). Some problems can be easily fixed (eg., NUTCH-1733), and since
Hi Lewis,
it seems to be related to NUTCH-1714:
WebPage-owned maps (metadata, headers, etc.) are not
initialized any more in the constructor.
This causes also other tests to fail.
The solution would be to replace
WebPage page = new WebPage();
by
WebPage page = WebPage.newBuilder().build();
Hi Lewis,
looks great!
What about the old
http://nutch.apache.org/version_control.html
Could be useful for users/developers not familiar
with Apache resources. Of course, content could be updated.
Also I miss the search box of the old version.
Shouldn't we add it again?
Sebastian
On
Hi Lewis,
a patch is ready, on my machine all tests pass now.
Currently, I experience problems with Jira:
feel free to open and resolve the issue.
Cheers,
Sebastian
On 06/19/2014 07:58 PM, Lewis John Mcgibbney wrote:
Hi Seb,
On Thu, Jun 19, 2014 at 1:46 PM, dev-digest-h...@nutch.apache.org
+1 for a release during the next month
I plan to address before a release:
- 2 issues related to redirects
NUTCH-926 https://issues.apache.org/jira/browse/NUTCH-926
NUTCH-1708 https://issues.apache.org/jira/browse/NUTCH-1708
- issues ready for commit:
NUTCH-1605
Build and tests run successfully on my local machine.
But it repeatedly fails on ubuntu* Jenkins machines.
The error in resolve-test could be related to
- changes to test dependencies (NUTCH-1802, NUTCH-1803)
- or missing ivy libs in ant installations
Any ideas?
Sebastian
2014-07-02 6:33
/tree of test,
presumably as first dependency of compile-core-test, in parallel to
compile-core.
Right?
I'll fix it over the weekend. But if anybody is faster... You're welcome!
Cheers,
Sebastian
2014-07-02 17:49 GMT+02:00 Sebastian Nagel wastl.na...@googlemail.com:
Build and tests run
Hi,
I have some problems running ant targets on recent trunk:
% ant runtime
fails if run from scratch (after ant clean)
but it succeeds after ant test or ant nightly.
in a plugin folder, e.g., src/plugin/parse-metatags
% ant test
The error causing the failure is always:
+1
sebastian
2014-07-30 10:56 GMT+02:00 Julien Nioche lists.digitalpeb...@gmail.com
mailto:lists.digitalpeb...@gmail.com:
Hi Lewis
https://issues.apache.org/jira/browse/NUTCH-1755 is more at a discussion
stage and can be done
later. I have moved it to 1.10
I've just
Hi,
we're glad to announce that there will be two events dedicated to Nutch
at the upcoming ApacheCon Europe
http://events.linuxfoundation.org/events/apachecon-europe
in Budapest, November 17 - 21, 2014.
1. an introductory talk about Nutch http://sched.co/1nyYa7b
as part of the Lucene/Solr
+1
* src package: compiles, tests pass
* bin package: successfully run small test crawl and indexed to Solr
On 08/13/2014 07:31 AM, Lewis John Mcgibbney wrote:
Hi user@ dev@,
This thread is a VOTE for releasing Apache Nutch 1.9. The release candidate
comprises the following components.
Hi,
afaics, Julien is right. It's possible to check it via:
bin/nutch parsechecker -Dhttp.content.limit=-1 -dumpText \
'http://search.dangdang.com/?key=%CA%FD%BE%DD%BF%E2'
With -Dhttp.content.limit=65534 (also the default) the content
is truncated.
Best,
Sebastian
On 09/17/2014 11:32 AM,
Hi Edoardo,
To make things easy I've used the JavaMain action to execute the classes
that the nutch scripts invokes, parametrized as necessary.
Ok. That means that each step (inject, generate, fetch, etc.) runs in its
own JVM. Right?
One thing that I noticed is that I found configuring the
of -D options defined externally (either by the bash script, oozie
workflow, etc...)
What do you think?
Best,
Edoardo
On Wed, Sep 24, 2014 at 3:34 PM, Sebastian Nagel wastl.na...@googlemail.com
mailto:wastl.na...@googlemail.com wrote:
Hi Edoardo,
To make things easy
Hi,
thanks for testing!
1. is
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
the real path. I.e., are there no symbolic links in the path?
The command
readlink -f
/Users/AngelaWang/Documents/programs/oodt/cas-curator/staging/products/xml/
should show
the indexer-solr plugin
in File:
conf/nutch-site.xml which is not mentioned there. Please add it too, so
future users could easily
follow it step by step.
Best,
Mengying (Angela) Wang
On Mon, Oct 27, 2014 at 4:29 PM, Sebastian Nagel wastl.na...@googlemail.com
mailto:wastl.na
Hi Albin,
you mean NUTCH-1870, right?
I'm in the process of reviewing your patch.
Just stuck in preparing the boilerplate required
to intregate parse-xsl into build, tests, javadoc.
I've added the jaxb dependencies to ivy,
but the xjb task fails. Presumably, because
there is a version mismatch.
Hi Lewis,
NUTCH-1825 (protocol-http may hang for certain web pages)
- I'm running tests in production since one week (with 1.x)
I'll check for any regressions in detail and will commit the next days.
I'll also in the process of committing
NUTCH-1483 Can't crawl filesystem with
Hi,
that's an Hadoop 1.x problem on Windows 7:
https://issues.apache.org/jira/browse/HADOOP-7682
http://mail-archives.apache.org/mod_mbox/nutch-user/201307.mbox/%3c51db1853.3040...@googlemail.com%3E
Indeed, using Linux may be the simplest solution,
simpler than to down/upgrade Hadoop.
Hi,
this problem is reproducible after deleting
% rm -rf ~/.ivy2/cache/org.restlet.jse
The error stated
[ivy:resolve] restlet: bad module name found in
http://maven.restlet.org
/org.restlet.ext.jaxrs/2.2.3/org.restlet.ext.jaxrs-2.2.3.pom
Thanks
2014-12-13 22:58 GMT+02:00 Sebastian Nagel wastl.na...@googlemail.com:
Hi,
this problem is reproducible after deleting
% rm -rf ~/.ivy2/cache/org.restlet.jse
The error stated
[ivy:resolve] restlet: bad
Hi Talat,
- AdaptiveFetchSchedular do not work. In default settings float, it needs
integer.
Confirmed, in nutch-default.xml these two properties are defined as floats
but read as integers. Configuration.getInt(name) then returns the default value.
Hi,
the jetty-client-6.1.22.jar
is a dependency needed only for testing.
Consequently, it's placed in
build/test/lib/
but only if you run the tests, resp. call
% ant resolve-test
There is also a target
% ant eclipse
which writes a complete Eclipse project configuration.
Sometimes, if
Dear all,
on behalf of the Nutch PMC it is my pleasure to announce that
Jorge Luis Betancourt Gonzalez has been voted in as committer
and member of the Nutch PMC. Jorge, would you mind telling us
about yourself, what you've done so far with Nutch, which areas
you think you'd like to get involved,
Hi Markus, hi Chris, hi Lewis,
-1 from me
A well-documented property is just an invitation to
disable robots rules. A hidden property is also no
alternative because it will be soon documented
in our mailing lists or somewhere on the web.
And shall we really remove or reformulate
Our software
+1
- successful small test crawl with HBase 0.94.26
- verified signatures
On 01/09/2015 09:58 AM, Lewis John Mcgibbney wrote:
Hi user@ dev@,
This thread is a VOTE for releasing Apache Nutch 2.3.
Quite incredibly we addressed 143 issues as per the release report
Hi Mahmoud,
which version of Nutch 2.x is used exactly?
Are all URLs in the redirect chain really accepted by URL filters?
Do URL normalizers change URLs (esp. ;jsessionid=...)?
Thanks,
Sebastian
On 03/20/2015 10:56 PM, Mahmoud Gzawi wrote:
Hi everyone,
I have a problem with redirection
Hi,
maybe this thread is better at dev@tika
since it's about building Tika.
Btw., I can successfully build Tika trunk/1.8.
Looks like something system-specific, similar to TIKA-1503:
gdalinfo is installed, but fails to parse a certain file format.
Thanks,
Sebastian
On 03/22/2015 08:26 AM,
Hi,
the reason is clearly in the URL filters. The single injected
URL does not pass the filter:
InjectorJob: total number of urls rejected by filters: 1
InjectorJob: total number of urls injected after normalization and filtering: 0
Please, check which URL filters are activated via property
Hi Tizy,
this should help:
https://wiki.apache.org/nutch/HttpPostAuthentication
http://svn.apache.org/repos/asf/nutch/trunk/conf/httpclient-auth.xml.template
For more details you could also check
https://issues.apache.org/jira/browse/NUTCH-827
https://issues.apache.org/jira/browse/NUTCH-1943
Hi all,
want to bring this on the agenda again. It's time
- NUTCH-1925 (upgrade Tika) is done
- we have 8 remaining issues [1]
- they either are (relate to) new features
(common crawl dumper, docker file, urlnormalizer-slash)
- or minor issues or improvements which could possibly be
Hi,
yes, there is a Nutch server providing a REST Api
and a web app client to run Nutch (as result of our
participation in GSoc 2014 by Fjodor Vershinin).
There are some limitations:
- only 2.x for now (please, follow NUTCH-1040 for a 1.x port)
- not complete (e.g., cannot configure a crawl)
For
Dear all,
it is my pleasure to announce that Guiseppe Totaro has joined us
as committer and member of the Nutch PMC. Congratulations on your
new role within the Apache Nutch community!
Guiseppe, would you mind telling us about yourself, and what you
are doing with Nutch, what you plan to do,
Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/78#discussion_r42675133
--- Diff:
src/plugin/index-links/src/java/org/apache/nutch/indexer/links/LinksIndexingFilter.java
---
@@ -0,0 +1,168 @@
+/**
+ * Licensed
+0
What about the source package *-src.zip and the tar.gz packages (*-bin.tar.gz,
*-src.tar.gz)?
The PGP key B876884A is missing in
http://www.apache.org/dist/nutch/KEYS
It is contained in
https://people.apache.org/keys/group/nutch.asc
We should
- either update the first
(it's also not in
Hi,
that might look strange but it's not a bug.
It could be improved, see below, simply because
it's not obvious - I also stumbled over this
point some time ago. It also pops up from time
to time on the mailing lists, see references below.
- when indexing the modified time (sent by the server)
Hi Chris, hi Markus,
+1 to release now / during the next days
> Going to try for a Tika 1.11 release candidate 1 today too.
Does this mean to wait until Tika has been released and to update
parse-tika as well?
> NUTCH-2064 is too important to miss another release
> especially if you are using
Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/78#discussion_r42160948
--- Diff:
src/plugin/index-links/src/java/org/apache/nutch/indexer/links/LinksIndexingFilter.java
---
@@ -0,0 +1,168 @@
+/**
+ * Licensed
Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/78#discussion_r42160759
--- Diff:
src/plugin/index-links/src/java/org/apache/nutch/indexer/links/LinksIndexingFilter.java
---
@@ -0,0 +1,168 @@
+/**
+ * Licensed
Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/78#discussion_r42163943
--- Diff: conf/nutch-default.xml ---
@@ -1896,4 +1896,33 @@ CAUTION: Set the parser.timeout to -1 or a bigger
value than 30, when using
Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/78#discussion_r42164537
--- Diff: conf/nutch-default.xml ---
@@ -1896,4 +1896,33 @@ CAUTION: Set the parser.timeout to -1 or a bigger
value than 30, when using
Github user sebastian-nagel commented on a diff in the pull request:
https://github.com/apache/nutch/pull/78#discussion_r42161108
--- Diff:
src/plugin/index-links/src/java/org/apache/nutch/indexer/links/LinksIndexingFilter.java
---
@@ -0,0 +1,168 @@
+/**
+ * Licensed
Hi,
Injector is normalizing, there is no extra setup required.
In case, you want to have special rules for injected URLs
(e.g., strip "index.html"), it's possible to configure
a special rules files for this scope by:
urlnormalizer.regex.file.inject
regex-normalize-inject.xml
Name of the
Dear all,
on behalf of the Nutch PMC it is my pleasure to announce
that Asitang Mishra has joined the Nutch team as committer
and PMC member. Asitang, please feel free to introduce
yourself and to tell the Nutch community about your
interests and your relation to Nutch.
Congratulations and
Welcome, Aron!
Thanks for the introduction
and the many links!
On 09/14/2015 04:25 PM, Aron Ahmadia wrote:
> Hi Folks,
>
> Since I'll be spending some time with the Nutch REST API and the 1.x code
> base, I figured I'd send a
> quick introduction email to the Nutch developer's mailing list.
>
Dear all,
on behalf of the Nutch PMC it is my pleasure to announce
that Sujen Shah has been voted in as committer and member
of the Nutch PMC. Sujen, would you mind to introduce
yourself to the Nutch community and tell in just a few
words about your interests and your plans regarding Nutch?
Hi Manali,
please send the subscription mail to
dev-subscr...@nutch.apache.org
Thanks,
Sebastian
On 09/29/2015 10:34 PM, Manali Shah wrote:
> Hello,
>
> I would like to subscribe to the mailing list.
>
> Best,
> Manali
Hi,
yes, this is a bug which has been fixed in the commit you mentioned
but reappeared again. Sorry,
see https://issues.apache.org/jira/browse/NUTCH-2124,
you'll also find a patch there. The fix will be
included in 1.11 for sure.
Thanks,
Sebastian
On 10/03/2015 09:22 AM, Taichi Ho wrote:
> Hi,
Hi Ayesha,
you should now be able to edit the content of the Nutch wiki.
Cheers and happy editing!
Sebastian
On 09/27/2015 08:33 PM, Ayesha Sabah Hasan wrote:
> Hi,
>
> My username is: ayeshahasan and i'd like to get permission to edit the Nutch
> Wiki.
>
> Thanks,
> Ayesha
1 - 100 of 3538 matches
Mail list logo