On Thu, Nov 25, 2010 at 6:40 AM, Chris Woolum cwoo...@moonvalley.com wrote:
Hello everyone,
I am new to nutch and am having a problem with my initial deployment of
it. It does not seem that nutch is properly parsing the SEGMENT string
and is trying to search invalid folders. I am using
On Wed, Dec 8, 2010 at 6:33 PM, shi wang wangshi.t...@gmail.com wrote:
I want to subscribe to the Nutch user mailing list.
Please see http://nutch.apache.org/mailing_lists.html . Presumably, you want
to subscribe to the users list, so sending mail to
user-subscr...@nutch.apache.org
will work.
On Fri, Dec 24, 2010 at 11:33 AM, Luis Taveras ltavera...@yahoo.com wrote:
Please suscribe to mailing list.
You should send mail to user-subscr...@nutch.apache.org in order to be
subscribed to the list. Please see http://nutch.apache.org/mailing_lists.html
Regards,
Gora
On Tue, Jan 4, 2011 at 11:36 PM, alx...@aim.com wrote:
Hello,
Thanks you for your response.
Let me give you more detail of the issue that I have.
First definitions. Let say I have my own domain that I host on a dedicated
server and call it mydomain.com
Next, call subdomain the followings
On Wed, Jan 5, 2011 at 11:25 PM, alx...@aim.com wrote:
I do search directly in Nutch version 1-2.
I think google gives very low scores to subpages of a domain and higher
scores to other domains for a given keyword.
That is possible, though I am not sure why the situation is different with
On Fri, Jan 14, 2011 at 11:13 PM, Asier Martínez axi...@gmail.com wrote:
Hi again,
I'm having performance issues due my DNS server configurations. I'm
now using public dns servers, ( like google etc ) and it seems to be
certan limit of query responses at the same time.
I'm reading about
On Wed, Jan 26, 2011 at 9:15 AM, Adam Estrada
estrada.adam.gro...@gmail.com wrote:
Curious...I have been using Nutch for a while now and have never tried to
index any audio or video formats. Is it feasible to grab the audio out of
both forms of media and then index it? I believe this would
On Wed, Jan 26, 2011 at 7:17 PM, Estrada Groups
estrada.adam.gro...@gmail.com wrote:
Thanks Gora! I am interested I'm searching through the text from these audio
and video streams. An example would be a 911 dispatch call and maybe even all
the recorded official chatter about it. That is just
On Wed, Jan 26, 2011 at 7:38 PM, Adam Estrada
estrada.adam.gro...@gmail.com wrote:
Another example would be the content embedded in this flash movie.
http://digitalmedia.worldbank.org/SSP/lac/investment-in-haiti/
[...]
ffmpeg can pull out audio from video streams, and a working
speech-to-text
On Sat, Feb 12, 2011 at 2:57 PM, Amna Waqar amna.waqar...@gmail.com wrote:
Hi all,
I want to know do the ASF license of nutch allows us to modfiy its code and
make a new search engine and then start earning revenue on the basis of it..
[...]
Yes, it does. This might help:
On Sat, Feb 12, 2011 at 9:01 PM, Estrada Groups
estrada.adam.gro...@gmail.com wrote:
The disc failed on my PC I so will have to test out the patch on the Mac ;-).
Is this the version that is still reliant on Gora or have the two been mashed
together? I haven't looked at nighty builds in over
Hi,
If you mean changing just one field of a document, one cannot do that: Solr
is not a RDBMS.
However, one does not have to delete a document, and then reindex it.
Simply indexing a document with the same ID, with all fields including the
changed one, updates it in the index.
Regards,
Gora
On Wed, Mar 23, 2011 at 3:26 PM, Gabriele Kahlout
gabri...@mysimpatico.com wrote:
$ cat wikipedia_links_simple.nt | grep http://simple.wiki; | awk '{print
$1}' | sort -u | sed -E 's/|//g'
I have lost track of what you were trying to do, but it really should
not be that difficult. Taking the
Hi,
Been a while since I have personally used Nutch
in a production environment, but if you are using
some kind of a CMS/framework that provides
hooks for page creation/modification, your best
option might be to use such a hook to trigger a
recrawl of the page.
At least that is the solution
Hi,
Please see http://nutch.apache.org/mailing_lists.html
for how to subscribe to various mailing lists.
Regards,
Gora
Hi,
Not too familiar these days
with Nutch, but my guess is
that a Solr analyser is getting applied. To have a field exactly as is, use
the String fieldtype on Solr's schema.xml rather than tje text fieldtype.
Regards,
Gora
On 05-Aug-2011 6:35 PM, Marek Bachmann m.bachm...@uni-kassel.de wrote:
On Wed, Aug 31, 2011 at 1:52 PM, Johan Svensson
johan.svens...@euroling.se wrote:
I want to put different weights to different domains, so that I can push up
results from my main site. Say for example, I have www.example.com with a
few but important pages, and blog.example.com with zillions of
On Wed, Aug 31, 2011 at 2:51 PM, Johan Svensson
johan.svens...@euroling.se wrote:
Thank you! This looks interesting. However, I wonder if it really can solve
this problem. No part of the search query is by necessary means part of the
domain name. Let's say for example that we search for foobar.
On Wed, Sep 7, 2011 at 1:16 PM, Danicela nutch danicela-nu...@mail.com wrote:
[...]
The first time, I put a spellcheck.build=true in the request, the index was
modified, but has only 20 bytes. (I think that's strange for 7000 indexed
pages)
This seems to indicate that something went wrong
On Thu, Jan 5, 2012 at 4:42 AM, niviksha nivik...@gmail.com wrote:
Hi all, this is my first post.
I've used lucene extensively in the past, but am just getting my feet wet
with Nutch. The problem I have is to use Nutch to crawl relational (sql)
databases. Is this possible via the current plug
On 25 January 2013 16:05, peterbarretto peterbarrett...@gmail.com wrote:
I still get the below error after setting the java home variable
http://lucene.472066.n3.nabble.com/file/n4036204/nutch_java_home_error.png
Not sure of how much experience you have had with Unix-style
shell quoting, but
On 29 January 2013 16:20, peterbarretto peterbarrett...@gmail.com wrote:
Tried escaping the whitespace but it still did not work so i installed java
in another folder and now the installation work just fine
[...]
The message that I had referenced seems to say that one
should *not* be escaping
On 11 March 2013 15:04, Rohan Thakur rohan.i...@gmail.com wrote:
hi
I am new to nutch I wanted to know does nutch take care of any kind of
format change in the urls that we have set to crawl and does not require
any manual changes to the kind of changes that has been applied to the urls
to
On 5 June 2013 03:53, Julien Nioche lists.digitalpeb...@gmail.com wrote:
Check your URL filters e.g. that you removed the lines below which are
there by default
*# skip URLs containing certain characters as probable queries, etc.*
*-[?*!@=]*
[...]
Not directly related to your question, but I
On Mar 26, 2014 1:02 AM, John Lafitte jlafi...@brandextract.com wrote:
I setup a script that uses freegen to manually index new/updated URLs. I
thought it was working great, but now I'm just realizing that Solr returns
a score of 0 for these new documents. I thought the score was calculated
On 22 June 2014 22:07, Meraj A. Khan mera...@gmail.com wrote:
Hello Folks,
I have noticed that Nutch resources and mailing lists are mostly geared
towards the usage of Nutch in research oriented projects , I would like to
know from those of you who are using Nutch in production for large
On 23 June 2014 01:44, Meraj A. Khan mera...@gmail.com wrote:
Gora,
Thanks for sharing your admin perspective , rest assured I am not trying
to circumvent any politeness requirements in any way , as I mentioned
earlier , I am with in the crawl-delay limits that are being set by the web
On 11 June 2015 at 15:30, Deepa Jayaveer deepa.jayav...@tcs.com wrote:
Thanks a lot for your response.
will Nutch can handle POST request?
Don't think so. How would it know what POST data is expected by the page?
Regards,
Gora
Hi,
A HTTP 501 error is a method not implemented error, as you could have
searched and found out. What that means is that the server you are trying
to crawl does not implement GET for that URL.
Regards,
Gora
On 11 June 2015 at 14:37, Deepa Jayaveer deepa.jayav...@tcs.com wrote:
Hi All,
Hi,
Would suggest starting out by looking at Common Crawl:
https://commoncrawl.org/
Regards,
Gora
30 matches
Mail list logo