I have been able to compile under OpenJDK 11
Have not done anything further so far
I'm gonna try to get to it this evening
Greetz
Ralf
On Wed, Aug 24, 2022 at 1:29 PM Markus Jelsma
wrote:
>
> Hi,
>
> Everything seems fine, the crawler seems fine when trying the binary
> distribution. The source
so far... it doesn't select anything when creating segments:
0 records selected for fetching, exiting
On Wed, Aug 24, 2022 at 3:02 PM BlackIce wrote:
>
> I have been able to compile under OpenJDK 11
> Have not done anything further so far
> I'm gonna try to get to it t
nevermind I made a typo...
It fetches it parses
On Thu, Aug 25, 2022 at 3:42 AM BlackIce wrote:
>
> so far... it doesn't select anything when creating segments:
> 0 records selected for fetching, exiting
>
> On Wed, Aug 24, 2022 at 3:02 PM BlackIce wrote:
> >
>
2:05 schreef Sebastian Nagel
> :
>
> > Hi Ralf,
> >
> > > It fetches it parses
> >
> > So a +1 ?
> >
> > Best,
> > Sebastian
> >
> > On 8/25/22 05:22, BlackIce wrote:
> > > nevermind I made a typo...
> > >
> > &
Tried some indexing... but when manually doing "Invertilinks" it says
something about input path does not exist.
Has invertilinks changed since 1.18?
Greetz
RRK
On Mon, Aug 29, 2022 at 3:38 PM BlackIce wrote:
>
> Haven't indexed anything to solr.. gonna give it a shot in a
OK,
I compiled Nutch under JDK11
Did some basic fetching, parsing, linkinversion and posterior indexing to Solr 9
[+1]
Great work!
RRK
On Tue, Aug 30, 2022 at 12:22 PM BlackIce wrote:
>
> Tried some indexing... but when manually doing "Invertilinks" it says
> something about
Hi,
I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , Hadoop
1.2.1, Java 8 Oracle, Intel I5 Quadcore, 16GB Ram
Currently the Fetch cycle is limited by my Internet connection.
Parse cycle uses an average of 10% per CPU core
Updatedb cycle uses average 3% per CPU core
Currently I'
HI,
My first try to run Nutch in pseudo dist, when trying to run any nutch
comman from the /runtime/deploy folder I get following error:
hduser@bl4ck1c3:/usr/local/nutch2/runtime/deploy$ bin/nutch inject urls
Warning: $HADOOP_HOME is deprecated.
14/03/18 16:19:33 INFO crawl.InjectorJob: Injector
Hi,
I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , Hadoop
1.2.1, Java 8 Oracle, Intel I5 Quadcore, 16GB Ram
Currently the Fetch cycle is limited by my Internet connection.
Parse cycle uses an average of 10% per CPU core
Updatedb cycle uses average 3% per CPU core
Currently I'
count.
> But optimization is very general concept. You should tune Nutch, Hdfs,
> Jobtracker and Hbase settings.
>
> Good luck ;)
>
>
> 2014-03-18 14:00 GMT+02:00 BlackIce :
>
> > Hi,
> >
> > I'm Using Nutch 2.2.1, Hbase 0.90.6 in pseudo distributed mode , H
HI I managed to get NUtch 2.2.1 running in pseudoi distributed mode by
making sure all libs are the same version across de Hadoop/Hbase/Nutch
essemble.
However, now when using the crawl script, the solrdedup job fails with:
java.lang.RuntimeException: java.lang.ClassNotFoundException:
org.apache.
I skimmed this book as well,
It saves a lot of time not having to Google all the info yourself.
It also expands on some of things, so it clarified many things for me
It is a very good starting point for a noob like me!
I Agree on the Title, it's a getting started book
On Wed, Mar 19, 2014 at
Mar 2014 20:48 tarihinde "BlackIce" yazdı:
>
> > Thank you,
> >
> > what are some good starting points to start tuning?
> >
> > thnx
> >
> >
> > On Tue, Mar 18, 2014 at 8:20 PM, Talat Uyarer wrote:
> >
> > > Hi,
> >
On Thu, Mar 20, 2014 at 3:13 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi BlackIce,
>
> On Wed, Mar 19, 2014 at 3:07 PM,
> wrote:
>
> >
> > HI,
> >
> > My first try to run Nutch in pseudo dist, when trying to run any nutch
>
plugins are located. Each
> element may be a relative or absolute path. If absolute, it is used
> as is. If relative, it is searched for on the classpath.
>
>
>
> 2014-03-20 13:53 GMT+02:00 BlackIce :
>
> > Thnx Lewis, Hadoop 1.2.1
> >
> &g
(Configuration.java:810)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:855)
... 8 more
Next step: Downgrade to Java 6? (i'm on 8)
On Fri, Mar 21, 2014 at 2:55 PM, BlackIce wrote:
> you mean the one located
> in /nutch/runtime/local ?
>
>
>
> On Thu, Mar 20,
Hi,
what is the correct sintax for language-identifier plugin?
I have this in my nutch-site.xml:
plugin.includes
protocol-http|urlfilter-regex|parse-(html|tika|text)|index-(basic|anchor|more)|query-(basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic)|
Any idea on when Nutch 2.3 will be released?
Thnx
Does anyone have a good nutch/solr 4.7 schema file?
Thnx
HI, playing around with Nutch 1.8 in localmode on Solr 4.7..
When indexing larger crawls 10k and up I get:
Indexer: java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:114)
at
ack trace? Probably add more debug info
> > in.
> >
> > This could be due to some disk size issue...
> >
> >
> > On Sat, May 3, 2014 at 8:51 PM, BlackIce wrote:
> >
> >> HI, playing around with Nutch 1.8 in localmode on Solr 4.7..
> >>
&
Hi,
what needs to be copyied over to the HDFS in Nutch 1.8? or what is the
command? when trying to run the crawl script under /runtime/deploy I get
the following:
14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: starting at 2014-05-03
14:59:03
14/05/03 14:59:03 INFO fetcher.Fetcher: Fetcher: segm
ments are named by a time-stamp, e.g.
>.../TestCrawl/segments/20140502231126/
> "crawl_generate" is a subdir.
>
> Can you specify the exact commands to run the crawler?
>
> Sebastian
>
> On 05/03/2014 08:30 PM, BlackIce wrote:
> > Hi,
> >
> > what needs
I get this error now whendoing crawls at 120k each run:
2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: starting at
2014-05-04 11:56:44
2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: db:
TestCrawl/crawldb
2014-05-04 11:56:44,549 INFO crawl.CrawlDb - CrawlDb update: s
Thnx
On Wed, May 7, 2014 at 4:07 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi BlackIce,
>
> On Sat, May 3, 2014 at 10:52 PM,
> wrote:
>
> >
> > Does anyone have a good nutch/solr 4.7 schema file?
> >
> >
> > What about t
I just installed Nutch 2.x from SVN and Solrindexer is not working, my
guess is that it has to dow ith that Solrindexer is now a plug-in, so I
activated it in the plug-ins (same as in 1.8)
When trying to run crawl script I get:
Indexing TestCrawl12 on SOLR index -> http://localhost:8983/solr
Ind
mailing list seems to have been a bit screwy
reply to the other Nutch 2.x question:
I have httpcore-4.2.5.jar
with what/where does it have to match?
thnx
On Thu, May 8, 2014 at 1:40 AM, BlackIce wrote:
> If Someone could explin to me how to get the code from there
>
>
>
httpcore-4.2.5
where would I look to make sure its the right one?
On Mon, May 12, 2014 at 5:48 PM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi BlackIce,
>
>
> On Sun, May 11, 2014 at 9:20 AM,
> wrote:
>
> >
> > Subject: Nutch 2.x f
I'm on it ;)
On Wed, May 7, 2014 at 4:05 AM, Lewis John Mcgibbney <
lewis.mcgibb...@gmail.com> wrote:
> Hi BlackIce,
>
> On Sat, May 3, 2014 at 10:52 PM,
> wrote:
>
> >
> >
> > Any idea on when Nutch 2.3 will be released?
> >
>
You are correct, I did some research and found it to be a TIKA issue, its
is fixed by setting the "Title" field to multivalued in schema.xml.I think
by default the Nutch schema should be updated accordingly!
Thnx
On Sat, May 3, 2014 at 8:27 PM, BlackIce wrote:
> Bad Reques
If Someone could explin to me how to get the code from there
On Thu, May 8, 2014 at 1:39 AM, BlackIce wrote:
> I'm on it ;)
>
>
> On Wed, May 7, 2014 at 4:05 AM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> wrote:
>
>> Hi BlackIce,
>>
>
"Title" filed needs to be set to multivalued - Tika issue, Tioka may return
multiple values for Title on PDF's
On Thu, May 8, 2014 at 1:37 AM, BlackIce wrote:
> Thnx
>
>
> On Wed, May 7, 2014 at 4:07 AM, Lewis John Mcgibbney <
> lewis.mcgibb...@gmail.com> w
sponse. I create a
> issue for this.
>
> Talat
> 17 May 2014 04:59 tarihinde "BlackIce" yazdı:
>
> > "Title" filed needs to be set to multivalued - Tika issue, Tioka may
> return
> > multiple values for Title on PDF's
> >
> >
> &g
Hi,
I'm thinking of securing Solr a bit, and I'm finding that there are several
ways of doing this. anyone have any experience with with the
autheticication for solr in nutch? Which type of solr security does one use
with Nuitch 1.9?
Thnx
olr server user
>
>
>
>
> solr.auth.password
> password
>
> Solr server password
>
>
>
>
>
> Done!
> nutch use password to index in solr.
> I hope this help yo.
>
> This post was very useful for me.
>
> http://communi
HI,
We have our search engine now as Beta 0.1 at www.enlle.com
We are using Nutch 1.9 to crawl the web and index data to Solr.
Currently we are at over 4 million records, which will increase
dramatically every day!
It has ocurred to me that we will be tweaking Solr frequently in order to
improv
I was just going trough the NUtch 2.3 IVY that it can use Solr as a
backend, anyone have tried this? if so is it better than Hbase?
Also thew Gora site says that Gora 0.5 in Nutch 2.3 can use: Apache Hadoop
1.0.1 and 2.4.0 Apache HBase 0.94.14
Anyone tried this?
Thnx
can not write HBase
> that run top of Hadoop 2.x. If you prefer use Hbase on Hadoop 2 You
> should Gora 0.6
>
> HTH
>
> 2015-05-15 3:47 GMT+03:00 BlackIce :
> > I was just going trough the NUtch 2.3 IVY that it can use Solr as a
> > backend, anyone have tried this? if s
Hi Group,
I just received a complaint from my ISP stating that my "server" was
attacking someones firewall. My guess is that I had nutch crawling too
agressivly. And my question is: What are "Best Practices" in order to avoid
such problems?
Return-path:
Envelope-to: ab...@hetzner.de
Delivery-date
han once every 2+ seconds, but 5+ seconds is better. Also, do not select
> over 500+ records for a host for each generation cycle. These guidelines
> keep you safe almost all the time. Faster is possible though.
>
> M.
>
> -Original message-
> From: BlackIce
> Sent:
;
> -Original message-----
> From: BlackIce
> Sent: Wednesday 18th November 2015 20:51
> To: user@nutch.apache.org
> Subject: Complaint from a crawled website!
>
> Hi Group,
>
> I just received a complaint from my ISP stating that my "server" was
> attacking som
But, what has it to do with anything that MY machine is filtered via
IPtables?
On Wed, Nov 18, 2015 at 10:43 PM, BlackIce wrote:
> My ISP has shutdown my site without prior notice
>
> On Wed, Nov 18, 2015 at 10:38 PM, Markus Jelsma <
> markus.jel...@openindex.io> wrote:
>
Hi,
Did I miss anything? I can't get the index metatags to work in 1.11 ...
No error message, no data in solr 5.3.1
Any ideas? Thnx!
plugin.includes
language-identifier|protocol-http|urlfilter-regex|parse-(html|tika|metatag)|index-(basic|anchor|more|metadata)|indexer-solr|scoring-opic|urln
Amazing how a little typo can drive one nuts for days
On Fri, Dec 11, 2015 at 10:14 PM, BlackIce wrote:
> Hi,
>
> Did I miss anything? I can't get the index metatags to work in 1.11 ...
>
> No error message, no data in solr 5.3.1
>
> Any ideas? Thnx!
>
>
Interesting indeed, in more than one way... This is just a plug-in right?
so it can be compiled with nutch 1.11?
On Thu, Dec 17, 2015 at 10:25 AM, Markus Jelsma
wrote:
> Interesting! That triple extractor and wdc parser could be useful indeed!
> It already uses any23. I wonder how easy we could
Hi,
I've just seen on a website which tracks bots, that "Tarantula" , our
nutch 1.11 based crawler is being classified as not obeying robots.txt.
What's the solution?
Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++
>
>
>
>
>
>
>
>
>
>
I would like to "groom" the crawldb My guess is that it should be an
easy thing just to built upon the function that removes the 404 status and
duplicates. But where do I find these?
Thank you
Solr 6 uses a diferentes directory structure now, follow solr tutorials on
how to create a core, it will tell you where it creates the cores
directory, inside that directory should be a directory called /conf thats
were the shema goes. Its also a good idea to read as muchas as posible on
solr, nu
of thumb for Internet development : everything you need to know about
the Internet can be found on the Internet.
El 13/6/2016 15:53, "Jose-Marcio Martins da Cruz" <
jose-marcio.mart...@mines-paristech.fr> escribió:
>
> Hi,
>
> Thanks Blackice.
>
> Can you suggest me
Also ive learned a lot from the writings of Erik Hatcher from lucidworks .
Probablemente the number 1 authority in the world in everything related to
solr
El 13/6/2016 15:53, "Jose-Marcio Martins da Cruz" <
jose-marcio.mart...@mines-paristech.fr> escribió:
>
> Hi,
>
>
enance,
> otherwise 404s found by dead links are fetched again and again.
>
> Sebastian
>
> On 06/14/2016 10:23 PM, Lewis John Mcgibbney wrote:
> > Hi BlackIce,
> >
> > On Mon, Jun 13, 2016 at 1:57 PM,
> wrote:
> >
> >> From: BlackIce
> >> T
Hi,
Up till now we have been running nutch and solr on the same Machine. But
now we have a scenario were we want to have running separate nutch
instances on separaté machines and index to solr over the Internet.
Since the indexing to Solr will be done over the public Internet it
presents us with
might want to open a ticket for.
Thanks
Lewis
On Wed, Jul 20, 2016 at 6:11 AM, wrote:
> From: BlackIce
> To: user@nutch.apache.org
> Cc:
> Date: Wed, 20 Jul 2016 15:11:22 +0200
> Subject: Indexing to remote Solr server
> Hi,
>
> Up till now we have been running nutch and s
I had a similar problem once.. it was some stupid synrtax thing, lemme
check my setup
On Fri, Sep 9, 2016 at 2:46 PM, KRIS MUSSHORN wrote:
> Looks like this is NOT in fact working.
>
> How do I get the metatags into Solr?
>
> i have a webpage @ https://snip/inside/directorates/cisd/asset.cfm
oring-opic|urlnormalizer-(
pass|regex|basic)
index.parse.md
metatag.description,metatag.keywords,h1,h2,h3,h4,
h5,h6,metatag.title
metatags.names
description,keywords,title,h1,h2,h3,h4,h5,h6
On Fri, Sep 9, 2016 at 3:00 PM, BlackIce wrote:
> I had a similar problem once.
shorn@mail.mil
> ~~
>
> -Original Message-
> From: BlackIce [mailto:blackice...@gmail.com]
> Sent: Friday, September 09, 2016 9:31 AM
> To: user@nutch.apache.org
> Subject: [Non-DoD Source] Re: indexing metatags with Nutch 1.12
>
> All
Change the -1 to a positive number like 5 or so (In the command)
On Sep 9, 2016 8:20 PM, "KRIS MUSSHORN" wrote:
> Executing this does NOT index everything in and under seed.txt.
>
> ./bin/crawl -i -D solr.server.url=http://localhost:8983/solr/TEST_CORE
> urls/ crawl -1
>
> I have to run it m
it found in the first run, on the 3rd run it will fetch the links it found
on the 2nd run and so forth...
Have a great weekend everyone !
On Fri, Sep 9, 2016 at 9:05 PM, Comcast wrote:
> Tried that. Same result
>
> Sent from my iPhone
>
> > On Sep 9, 2016, at 3:04 P
Can we now use Open graph metadata, if so how?
Thnx
Ralf
Try these, don't remember which I used and don't have access to my setup
right now (there used to be a whitelist/blacklist plugin, but I don't seem
to be able to find it on Google right now)
https://github.com/BayanGroup/nutch-custom-search
On Sep 30, 2016 7:35 PM, "KRIS MUSSHORN" wrote:
Ok bas
Then make your own :)
On Sep 30, 2016 11:13 PM, "Kris Musshorn" wrote:
> Thanks blackice but I cant use a plug in that’s not been maintained in a
> year in my production environment
>
> -Original Message-
> From: BlackIce [mailto:blackice...@gmail.com]
> Sent
Hi Filip,
You mentioned that you commented out "External Links" - what do the links
look like that point to the images? do they start with ":www.server.com" or
something like "images.server.com"? With "External Links" turned off Nutch
should interpret those links as "external sites" and thus not
d like to know is there a way to take control over the
> search for the new links, especially if it's possible within the realm of
> plugins.
>
> 2017-05-24 17:08 GMT+02:00 BlackIce :
>
> > Hi Filip,
> >
> >
> > You mentioned that you commented out "Exter
Why would it be forbidden?
Wasn't Cuba removed from the blocked Nations list under President Obama?
On Fri, May 26, 2017 at 2:42 PM, Eyeris Rodriguez Rueda
wrote:
> Hi all.
> I really want to install Ambari 2.5.0 and hadoop cluster in centos 7 but
> when i try to access to the webpage it looks
treaties.
The first step would be to contact Hortonworks compliance officer and see
if indeed this item falls under such restrictions and then go from there.
Hope this helps!
Greetings!
Ralf Kotowski
www.enlle.com
"La revolucion no sera televisada"
On Fri, May 26, 2017 at 5:13 PM, Black
forbidden also.
>
> The problem is that i dont know how to continue.
> maybe i will use a proxy to try to download the packages.
>
> I am very happy for your anwser and for your time to call to US Department.
> really thanks. In Cuba this things could be difficult.
>
&
I just got off the phone with someone at techsuport over at Hortonworks who
will forward my message to the corresponding person, I hope to hear back
from them soon.
On Fri, May 26, 2017 at 8:15 PM, BlackIce wrote:
> do you have a list of the files in particular which are forbidden?
> This
Sometimes it helps when one replaces the Solr.jar which comes with Nutch
with the solr.jar that comes with the solr one is using
On Sat, Jul 8, 2017 at 3:52 PM, Pau Paches
wrote:
> Hi,
> I have run the Nutch 1.x Tutorial with Solr 6.6.0.
> Many things do not work, there is a mismatch between the
I think by default the newer SOLR starts in "schemaless" mode.. One neds to
create a config directory with ALL necessary configuration files like
schema and solar.conf BEFORE creating the collection and then run a command
to create this collection using this conf directory. I don't have access to
m
Sure, that would be most excellent!
On Sep 14, 2017 9:41 PM, "Hiran CHAUDHURI"
wrote:
> Hi there.
>
>
>
> When I tried to setup Nutch 1.13 to connect to Solr 6.6 I found out that
> the Nutch schema shipped in .../conf/schema.xml needs quite some tweaking
> before Solr can use it.
>
> The reason
My guess would be that you need to look at schema.xml and disable
PositionIncrements
On Thu, Sep 28, 2017 at 6:44 PM, Sol Lederman
wrote:
> Hi,
>
> I'm following the tutorial to set up nutch with solr. I'm using a supported
> pair: nutch 1.13 with solr 5.5.0. I get this error creating the nutch
17, 8:38 AM, "Sebastian Nagel"
> wrote:
> >
> > Hi Folks,
> >
> > thanks to everyone who was able to review the release candidate!
> >
> > 72 hours have passed, please see below for vote results.
> >
> > [8] +1 Release this
Awesome
On Mon, Dec 25, 2017 at 11:36 PM, Mattmann, Chris A (3010) <
chris.a.mattm...@jpl.nasa.gov> wrote:
> Great work Seb and team!
>
> Sent from my iPhone
>
> On Dec 25, 2017, at 1:29 PM, Jorge Betancourt mailto:betancourt.jo...@gmail.com>> wrote:
>
> Great news!
> Thanks Sebastian!
>
>
> Bes
Is it just me? The md5 checksums don't match
On Tue, Dec 26, 2017 at 5:35 AM, BlackIce wrote:
> Awesome
>
> On Mon, Dec 25, 2017 at 11:36 PM, Mattmann, Chris A (3010) <
> chris.a.mattm...@jpl.nasa.gov> wrote:
>
>> Great work Seb and team!
>>
>> Sen
I'm also getting this on the source tarball and zip:
gpg: BAD signature from "Sebastian Nagel " [unknown]
On Tue, Dec 26, 2017 at 5:48 AM, BlackIce wrote:
> Is it just me? The md5 checksums don't match
>
>
> On Tue, Dec 26, 2017 at 5:35 AM, BlackIce wrote:
&
Nevermind.. my bad.. was trying to get the files with wget trough the link
to the mirrors obviously it would download only the html with the
mirror list
sorry
On Tue, Dec 26, 2017 at 6:02 AM, BlackIce wrote:
> I'm also getting this on the source tarball and zip:
>
Hi,
As stated it's a solr question... But I give you a hint (I don't have
access to the server right now)... Stemming is different for Spanish as for
English... If I remember correctly I had to use the hunspell tokenizer set
for Spanish Or something similar to that..
Sorry I can't be more pre
Also in order for Spanish accents to be propperly stemmed... Something had
to be set to ISO Latin And a propper file had to be supplied to
solr
I'm on a tablet and can't access the server to look
On Feb 13, 2018 10:03 PM, "BlackIce" wrote:
Hi,
As stated
Hi,
did run into a problem with Nutch 1.14 which I don't recall having in
previous versions
I'm find a lot of "\n" (Newline?) in my content of crawled sites.
I've tried with different configurations/constelations of Html parser and
Tika and just Tika to no avail.
All the info I can find on thi
ble.
>
> A simple
> s/\n/ /g
> should restore the old "look" of extracted plain texts.
>
> Best,
> Sebastian
>
>
> On 02/26/2018 04:17 PM, BlackIce wrote:
> > Hi,
> >
> > did run into a problem with Nutch 1.14 which I don't recall havin
Basically what you're saying is that you need more control over what is
being indexed?
That's an excellent question!
Greetz!
On Mar 17, 2018 11:46 AM, "ShivaKarthik S"
wrote:
> Hi,
>
> Is there any way to block the hub pages & index only the articles from the
> websites. I wanted to index only
+1
stoopid question, but I can't find any info on it... can we now parse Open
Graph metatags?
Greetz
On Mon, Jun 11, 2018 at 9:11 PM Roannel Fernández Hernández
wrote:
> +1
>
> Regards
>
> - Chris Mattmann escribió:
> > ++1!
> >
> >
> >
> > Sounds great.
> >
> >
> >
> > Cheers,
> >
> > Ch
og:description :The Open Graph protocol enables any web page to
> become a rich object in a
> social graph.
>
>
> On 06/11/2018 11:44 PM, BlackIce wrote:
> > +1
> >
> > stoopid question, but I can't find any info on it... can we now parse
> Open
>
PS: Does this work when configured in site.xml like regular metatdata?
On Tue, Jun 12, 2018 at 1:31 PM BlackIce wrote:
> sweet thnx!
>
> On Tue, Jun 12, 2018 at 1:29 PM Sebastian Nagel <
> wastl.na...@googlemail.com> wrote:
>
>> > stoopid question, but I can'
overwrites definition in nutch-default.xml
>
> On 06/12/2018 02:26 PM, BlackIce wrote:
> > PS: Does this work when configured in site.xml like regular metatdata?
> >
> > On Tue, Jun 12, 2018 at 1:31 PM BlackIce wrote:
> >
> >> sweet thnx!
> >>
Splendid
On Tue, Aug 7, 2018 at 3:46 PM lewis john mcgibbney
wrote:
> Excellent. Thanks for taking on release manager Seb, it’s making a huge
> impact. Nice work folks.
>
> On Tue, Aug 7, 2018 at 05:37 wrote:
>
> >
> > user Digest 7 Aug 2018 12:37:25 - Issue 2921
> >
> > Topics (messages 34
I think you are correct in your assumption.
According to this:
https://issues.apache.org/jira/browse/NUTCH-2620?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
Nutch asumes that the TLD is no longer than 4 characters, this is being in
the process of being fixed in the next rel
try making these fields "Multivalued", like so:
...
On Thu, Aug 30, 2018 at 1:45 PM Amarnatha Reddy wrote:
> Hi Nutch Team,
>
> We are trying to crwal a websites which is korea and japanees langaugae
> based, while doing to index data into solr we are getting into below error,
> kindly sugg
Sorry if this seems trivial, but did you reload the collection and/or
restart Solr?
On Thu, Aug 30, 2018 at 4:19 PM Amarnatha Reddy wrote:
> Still am facing the same issue after changing the suggested values any clue
> please
>
> Amarnath
>
> On Thu 30 Aug, 2018, 7:50 P
There was a plugin awhile ago which allowed you to specify different tags
to be indexed or excluded from being indexed if I'm not mistaken it was
this:
http://www.longconnections.com/blog/2015/6/3/using-apache-nutchsolr-to-build-a-search-engine-with-auto-complete-feature
Good luck and please let
91 matches
Mail list logo