Stemmer

2009-01-19 Thread David Jashi
Hello, everyone.

Is there any chance to make Nutch call stemmer in batch? That is, give him
not a single word (token), but array of words. My stemmer has external
parts, called by HTTP request, so you can imagine, what performance overhead
I have.

-- 
with best regards,
David Jashi
Web development EO,
Caucasus Online
+995(32)970368
da...@jashi.ge

პატივისცემით,
დავით ჯაში
ვებ–განვითარების დირექტორი
"კავკასუს  ონლაინი"
+995(32)970368
da...@jashi.ge


nutch database

2009-01-19 Thread ripper07

Hi,
I've got two questions about nutch database.

1. Can Nutch search results be accessed by some way other than by localhost?

2. We need a stand-alone application to access Nutch's database while the
crawler is still running. Is there a way that can be done or are the indexes
being formed only at the end of the crawling?
-- 
View this message in context: 
http://www.nabble.com/nutch-database-tp21552599p21552599.html
Sent from the Nutch - User mailing list archive at Nabble.com.



Searching on a specific index field

2009-01-19 Thread ahammad

I have an index which contains fields that are extracted from meta tags. I
used a plugin that someone on this mailing list wrote years ago. Basically
the plugin allows the extraction and indexing of html meta tags. I verified
that the html meta tags were indexed using Luke.

>From reading the mailing list, I know that there needs to be a query plugin
for the indexer (usually based off query-site). However, the writing plugins
example on the Wiki doesn't mention that you need a separate plugin for
querying. Also, the plugin code that I received had all the source files
(including the query filter) packaged under one plugin.

Everything was added in the build.xml file, nutch-default.xml, and
nutch-site.xml (even though the plugin worked without any modifications to
nutch-site.xml). I then ran ant to build it. The log files show that the
plugin was included in the build when I crawled.

My questions is this: is it possible to have a query filter that works on
all the tags or do I need a separate plugin for every meta tag? I have 21
meta tags so that wouldn't be a viable solution.

I should note that the code I got from the author worked for him, but not
for me. Could it be that I missed a configuration step that basically tells
Nutch to use the query filter?

Do I need to re deploy the war file in tomcat? When I build the source code,
a new war file is created in C:\nutch\build. Do I need to replace the war
file in C:\nutch with the one in C:\nutch\build?

Thanks. Let me know if you need any more information. I'm not sure if I was
very descriptive.

Cheers


-- 
View this message in context: 
http://www.nabble.com/Searching-on-a-specific-index-field-tp21551514p21551514.html
Sent from the Nutch - User mailing list archive at Nabble.com.



Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread Lyndon Maydwell
Lucene has support for OR queries, so it should be possible to do it,
but support for this in nutch isn't available as far as I know. I'd
also be intersted if anyone has managed to implement this.

On Tue, Jan 20, 2009 at 1:50 AM, M S Ram  wrote:
> Oh! That's sad! :( What is the best approach to provide an OR search now?
> Should I go down to Lucene? Does Lucene understand HDFS? Please help me with
> the appropriate guide lines.
>
> Thank you,
> Ram
>
> Doğacan Güney wrote:
>>
>> Hi,
>>
>> On Mon, Jan 19, 2009 at 4:02 PM, M S Ram  wrote:
>>
>>>
>>> Hi,
>>>
>>> Does Nutch support the boolean OR operator (or something similar) in a
>>> search query? I mean is there any class already available to do this? The
>>> Nutch search interface doesn't seem to have this option.
>>>
>>> Expcted functionality: If I ask it to search for (Post Graduate) OR
>>> (Masters), it should fetch the pages which contain at least one of {"Post
>>> Graduate", "Masters"}.
>>>
>>>
>>
>> Unfortunately no.
>>
>> There is an issue with a patch
>>
>> https://issues.apache.org/jira/browse/NUTCH-479
>>
>> but nothing happened for a while.
>>
>>
>>>
>>> Thank you,
>>> Ram.
>>>
>>>
>>
>>
>>
>>
>
>


Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread M S Ram
Oh! That's sad! :( What is the best approach to provide an OR search 
now? Should I go down to Lucene? Does Lucene understand HDFS? Please 
help me with the appropriate guide lines.


Thank you,
Ram

Doğacan Güney wrote:

Hi,

On Mon, Jan 19, 2009 at 4:02 PM, M S Ram  wrote:
  

Hi,

Does Nutch support the boolean OR operator (or something similar) in a
search query? I mean is there any class already available to do this? The
Nutch search interface doesn't seem to have this option.

Expcted functionality: If I ask it to search for (Post Graduate) OR
(Masters), it should fetch the pages which contain at least one of {"Post
Graduate", "Masters"}.




Unfortunately no.

There is an issue with a patch

https://issues.apache.org/jira/browse/NUTCH-479

but nothing happened for a while.

  

Thank you,
Ram.






  




Re: AW: Nutch Training Seminar

2009-01-19 Thread Dennis Kubes

Hi Guys,

Haven't decided where or when for the seminar yet.  Been pretty busy 
finishing up a few different projects.  Good news is as of today those 
are finished and now I will have more time to finish this up along with 
helping to get Nutch 1.0 released.  Sorry this is taking so long to put 
together.


Dennis

Girish Redekar wrote:

Hi Dennis -

Not sure if I'm too late, but I'm extremely to join the seminar too. I'm
particularly interested in understanding how to customize the Nutch scoring.

Apologies upfront for a naive doubt - how/where/where would such a seminar
be held (this is my first day with this mailing list).

Thanks,
Girish



Re: Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread Doğacan Güney
Hi,

On Mon, Jan 19, 2009 at 4:02 PM, M S Ram  wrote:
> Hi,
>
> Does Nutch support the boolean OR operator (or something similar) in a
> search query? I mean is there any class already available to do this? The
> Nutch search interface doesn't seem to have this option.
>
> Expcted functionality: If I ask it to search for (Post Graduate) OR
> (Masters), it should fetch the pages which contain at least one of {"Post
> Graduate", "Masters"}.
>

Unfortunately no.

There is an issue with a patch

https://issues.apache.org/jira/browse/NUTCH-479

but nothing happened for a while.

> Thank you,
> Ram.
>



-- 
Doğacan Güney


Does Nutch support the boolean OR operator in a search query?

2009-01-19 Thread M S Ram

Hi,

Does Nutch support the boolean OR operator (or something similar) in a 
search query? I mean is there any class already available to do this? 
The Nutch search interface doesn't seem to have this option.


Expcted functionality: If I ask it to search for (Post Graduate) OR 
(Masters), it should fetch the pages which contain at least one of 
{"Post Graduate", "Masters"}.


Thank you,
Ram.


AW: login failedd exception

2009-01-19 Thread Koch Martina
Hi ,

this error is mentioned and solved in this message: 
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11169.html

If you're running Nutch in Windows, you need to have cygwin installed and  in 
the PATH variable the following entries need to  be included:
\bin;\usr\bin

Hope this helps.

Kind regards,
Martina

PS: Please don't post the same issue in two different lists.


-Ursprüngliche Nachricht-
Von: Vimal Varghese [mailto:vimal.vargh...@tcs.com] 
Gesendet: Montag, 19. Januar 2009 11:01
An: nutch-user@lucene.apache.org
Betreff: login failedd exception

Hi,

I have configured the latest nutch from the nightly build in eclipse.

I am getting this following error. 

Exception in thread "main" java.io.IOException: Failed to get the current 
user's information.
at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(
JobClient.java:592)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:774
)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
Caused by: javax.security.auth.login.LoginException: Login failed: Cannot 
run program "whoami": CreateProcess error=2, The system cannot find the 
file specified
at org.apache.hadoop.security.UnixUserGroupInformation.login(
UnixUserGroupInformation.java:250)
at org.apache.hadoop.security.UnixUserGroupInformation.login(
UnixUserGroupInformation.java:275)
at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
... 5 more


Is there any way to overcome this.

Regards,
Vimal Varghese
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: Problem with Nutch on Eclipse & NetBeans

2009-01-19 Thread M S Ram

Thanks to both Alex and Ramadhany. This worked. :)

- Ram

Imam Nur Ramadhany wrote:

Hi,

I had same problem
I changed " with ' 
from "/>

to 
and it works.

Cheers,

Ramadhany





From: M S Ram 
To: nutch-user@lucene.apache.org
Sent: Monday, January 19, 2009 5:26:00 PM
Subject: Problem with Nutch on Eclipse & NetBeans

Hi,

In the search.jsp file, there is a line as follows:

"/>

When I tried to invoke this from the client by submitting a search query in the 
Nutch search interface, I see the following error:

org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value  language + 
"/include/header.html" is quoted with " which must be escaped when used within 
the value.


And when I tried to escape the qutoes within quote as follows,

"/>

Eclipse is complaining saying "Syntax error on token "Invalid Character", ) 
expected".
NetBeans, for the same  thing says "Illegal character 92".

Please help me reslove this problem.

Thank you,
Ram.



  
  




Re: Problem with Nutch on Eclipse & NetBeans

2009-01-19 Thread Imam Nur Ramadhany
Hi,

I had same problem
I changed " with ' 
from "/>
to 
and it works.

Cheers,

Ramadhany





From: M S Ram 
To: nutch-user@lucene.apache.org
Sent: Monday, January 19, 2009 5:26:00 PM
Subject: Problem with Nutch on Eclipse & NetBeans

Hi,

In the search.jsp file, there is a line as follows:

"/>

When I tried to invoke this from the client by submitting a search query in the 
Nutch search interface, I see the following error:

org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value  
language + "/include/header.html" is quoted with " which must be escaped when 
used within the value.


And when I tried to escape the qutoes within quote as follows,

"/>

Eclipse is complaining saying "Syntax error on token "Invalid Character", ) 
expected".
NetBeans, for the same  thing says "Illegal character 92".

Please help me reslove this problem.

Thank you,
Ram.



  

Re: Problem with Nutch on Eclipse & NetBeans

2009-01-19 Thread Alexander Aristov
Replace outer quotes with single one.

Alex

2009/1/19 M S Ram 

> Hi,
>
> In the search.jsp file, there is a line as follows:
>
> "/>
>
> When I tried to invoke this from the client by submitting a search query in
> the Nutch search interface, I see the following error:
>
> org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value
>  language + "/include/header.html" is quoted with " which must be escaped
> when used within the value.
>
>
> And when I tried to escape the qutoes within quote as follows,
>
> "/>
>
> Eclipse is complaining saying "Syntax error on token "Invalid Character", )
> expected".
> NetBeans, for the same  thing says "Illegal character 92".
>
> Please help me reslove this problem.
>
> Thank you,
> Ram.
>



-- 
Best Regards
Alexander Aristov


Problem with Nutch on Eclipse & NetBeans

2009-01-19 Thread M S Ram

Hi,

In the search.jsp file, there is a line as follows:

"/>

When I tried to invoke this from the client by submitting a search query 
in the Nutch search interface, I see the following error:


org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value  language + 
"/include/header.html" is quoted with " which must be escaped when used within 
the value.


And when I tried to escape the qutoes within quote as follows,

"/>

Eclipse is complaining saying "Syntax error on token "Invalid 
Character", ) expected".

NetBeans, for the same  thing says "Illegal character 92".

Please help me reslove this problem.

Thank you,
Ram.


login failedd exception

2009-01-19 Thread Vimal Varghese
Hi,

I have configured the latest nutch from the nightly build in eclipse.

I am getting this following error. 

Exception in thread "main" java.io.IOException: Failed to get the current 
user's information.
at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(
JobClient.java:592)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:774
)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127)
at org.apache.nutch.crawl.Injector.inject(Injector.java:160)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:112)
Caused by: javax.security.auth.login.LoginException: Login failed: Cannot 
run program "whoami": CreateProcess error=2, The system cannot find the 
file specified
at org.apache.hadoop.security.UnixUserGroupInformation.login(
UnixUserGroupInformation.java:250)
at org.apache.hadoop.security.UnixUserGroupInformation.login(
UnixUserGroupInformation.java:275)
at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
... 5 more


Is there any way to overcome this.

Regards,
Vimal Varghese
=-=-=
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you




Re: Nutch Training Seminar

2009-01-19 Thread Neil Rosewarm
Yes, No issues

Please advise

Thanks


--- On Mon, 1/19/09, Lukáš Vlček  wrote:
From: Lukáš Vlček 
Subject: Re: Nutch Training Seminar
To: nutch-user@lucene.apache.org
Date: Monday, January 19, 2009, 7:16 AM

Hi,
Did you already decide how you are going to do the training from the
technology point of view? If it si going to be just online live streaming
will there be a chance (will we bw allowed) to record it onto local HD for
later personal reitaration?

Regards,
Lukas

On Mon, Jan 19, 2009 at 5:22 AM, Girish Redekar
wrote:

> Hi Dennis -
>
> Not sure if I'm too late, but I'm extremely to join the seminar
too. I'm
> particularly interested in understanding how to customize the Nutch
> scoring.
>
> Apologies upfront for a naive doubt - how/where/where would such a seminar
> be held (this is my first day with this mailing list).
>
> Thanks,
> Girish
>



-- 
http://blog.lukas-vlcek.com/