Stemmer
Hello, everyone. Is there any chance to make Nutch call stemmer in batch? That is, give him not a single word (token), but array of words. My stemmer has external parts, called by HTTP request, so you can imagine, what performance overhead I have. -- with best regards, David Jashi Web development EO, Caucasus Online +995(32)970368 da...@jashi.ge პატივისცემით, დავით ჯაში ვებ–განვითარების დირექტორი "კავკასუს ონლაინი" +995(32)970368 da...@jashi.ge
nutch database
Hi, I've got two questions about nutch database. 1. Can Nutch search results be accessed by some way other than by localhost? 2. We need a stand-alone application to access Nutch's database while the crawler is still running. Is there a way that can be done or are the indexes being formed only at the end of the crawling? -- View this message in context: http://www.nabble.com/nutch-database-tp21552599p21552599.html Sent from the Nutch - User mailing list archive at Nabble.com.
Searching on a specific index field
I have an index which contains fields that are extracted from meta tags. I used a plugin that someone on this mailing list wrote years ago. Basically the plugin allows the extraction and indexing of html meta tags. I verified that the html meta tags were indexed using Luke. >From reading the mailing list, I know that there needs to be a query plugin for the indexer (usually based off query-site). However, the writing plugins example on the Wiki doesn't mention that you need a separate plugin for querying. Also, the plugin code that I received had all the source files (including the query filter) packaged under one plugin. Everything was added in the build.xml file, nutch-default.xml, and nutch-site.xml (even though the plugin worked without any modifications to nutch-site.xml). I then ran ant to build it. The log files show that the plugin was included in the build when I crawled. My questions is this: is it possible to have a query filter that works on all the tags or do I need a separate plugin for every meta tag? I have 21 meta tags so that wouldn't be a viable solution. I should note that the code I got from the author worked for him, but not for me. Could it be that I missed a configuration step that basically tells Nutch to use the query filter? Do I need to re deploy the war file in tomcat? When I build the source code, a new war file is created in C:\nutch\build. Do I need to replace the war file in C:\nutch with the one in C:\nutch\build? Thanks. Let me know if you need any more information. I'm not sure if I was very descriptive. Cheers -- View this message in context: http://www.nabble.com/Searching-on-a-specific-index-field-tp21551514p21551514.html Sent from the Nutch - User mailing list archive at Nabble.com.
Re: Does Nutch support the boolean OR operator in a search query?
Lucene has support for OR queries, so it should be possible to do it, but support for this in nutch isn't available as far as I know. I'd also be intersted if anyone has managed to implement this. On Tue, Jan 20, 2009 at 1:50 AM, M S Ram wrote: > Oh! That's sad! :( What is the best approach to provide an OR search now? > Should I go down to Lucene? Does Lucene understand HDFS? Please help me with > the appropriate guide lines. > > Thank you, > Ram > > Doğacan Güney wrote: >> >> Hi, >> >> On Mon, Jan 19, 2009 at 4:02 PM, M S Ram wrote: >> >>> >>> Hi, >>> >>> Does Nutch support the boolean OR operator (or something similar) in a >>> search query? I mean is there any class already available to do this? The >>> Nutch search interface doesn't seem to have this option. >>> >>> Expcted functionality: If I ask it to search for (Post Graduate) OR >>> (Masters), it should fetch the pages which contain at least one of {"Post >>> Graduate", "Masters"}. >>> >>> >> >> Unfortunately no. >> >> There is an issue with a patch >> >> https://issues.apache.org/jira/browse/NUTCH-479 >> >> but nothing happened for a while. >> >> >>> >>> Thank you, >>> Ram. >>> >>> >> >> >> >> > >
Re: Does Nutch support the boolean OR operator in a search query?
Oh! That's sad! :( What is the best approach to provide an OR search now? Should I go down to Lucene? Does Lucene understand HDFS? Please help me with the appropriate guide lines. Thank you, Ram Doğacan Güney wrote: Hi, On Mon, Jan 19, 2009 at 4:02 PM, M S Ram wrote: Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted functionality: If I ask it to search for (Post Graduate) OR (Masters), it should fetch the pages which contain at least one of {"Post Graduate", "Masters"}. Unfortunately no. There is an issue with a patch https://issues.apache.org/jira/browse/NUTCH-479 but nothing happened for a while. Thank you, Ram.
Re: AW: Nutch Training Seminar
Hi Guys, Haven't decided where or when for the seminar yet. Been pretty busy finishing up a few different projects. Good news is as of today those are finished and now I will have more time to finish this up along with helping to get Nutch 1.0 released. Sorry this is taking so long to put together. Dennis Girish Redekar wrote: Hi Dennis - Not sure if I'm too late, but I'm extremely to join the seminar too. I'm particularly interested in understanding how to customize the Nutch scoring. Apologies upfront for a naive doubt - how/where/where would such a seminar be held (this is my first day with this mailing list). Thanks, Girish
Re: Does Nutch support the boolean OR operator in a search query?
Hi, On Mon, Jan 19, 2009 at 4:02 PM, M S Ram wrote: > Hi, > > Does Nutch support the boolean OR operator (or something similar) in a > search query? I mean is there any class already available to do this? The > Nutch search interface doesn't seem to have this option. > > Expcted functionality: If I ask it to search for (Post Graduate) OR > (Masters), it should fetch the pages which contain at least one of {"Post > Graduate", "Masters"}. > Unfortunately no. There is an issue with a patch https://issues.apache.org/jira/browse/NUTCH-479 but nothing happened for a while. > Thank you, > Ram. > -- Doğacan Güney
Does Nutch support the boolean OR operator in a search query?
Hi, Does Nutch support the boolean OR operator (or something similar) in a search query? I mean is there any class already available to do this? The Nutch search interface doesn't seem to have this option. Expcted functionality: If I ask it to search for (Post Graduate) OR (Masters), it should fetch the pages which contain at least one of {"Post Graduate", "Masters"}. Thank you, Ram.
AW: login failedd exception
Hi , this error is mentioned and solved in this message: http://www.mail-archive.com/nutch-user@lucene.apache.org/msg11169.html If you're running Nutch in Windows, you need to have cygwin installed and in the PATH variable the following entries need to be included: \bin;\usr\bin Hope this helps. Kind regards, Martina PS: Please don't post the same issue in two different lists. -Ursprüngliche Nachricht- Von: Vimal Varghese [mailto:vimal.vargh...@tcs.com] Gesendet: Montag, 19. Januar 2009 11:01 An: nutch-user@lucene.apache.org Betreff: login failedd exception Hi, I have configured the latest nutch from the nightly build in eclipse. I am getting this following error. Exception in thread "main" java.io.IOException: Failed to get the current user's information. at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions( JobClient.java:592) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:774 ) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127) at org.apache.nutch.crawl.Injector.inject(Injector.java:160) at org.apache.nutch.crawl.Crawl.main(Crawl.java:112) Caused by: javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified at org.apache.hadoop.security.UnixUserGroupInformation.login( UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login( UnixUserGroupInformation.java:275) at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715) ... 5 more Is there any way to overcome this. Regards, Vimal Varghese =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Re: Problem with Nutch on Eclipse & NetBeans
Thanks to both Alex and Ramadhany. This worked. :) - Ram Imam Nur Ramadhany wrote: Hi, I had same problem I changed " with ' from "/> to and it works. Cheers, Ramadhany From: M S Ram To: nutch-user@lucene.apache.org Sent: Monday, January 19, 2009 5:26:00 PM Subject: Problem with Nutch on Eclipse & NetBeans Hi, In the search.jsp file, there is a line as follows: "/> When I tried to invoke this from the client by submitting a search query in the Nutch search interface, I see the following error: org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value language + "/include/header.html" is quoted with " which must be escaped when used within the value. And when I tried to escape the qutoes within quote as follows, "/> Eclipse is complaining saying "Syntax error on token "Invalid Character", ) expected". NetBeans, for the same thing says "Illegal character 92". Please help me reslove this problem. Thank you, Ram.
Re: Problem with Nutch on Eclipse & NetBeans
Hi, I had same problem I changed " with ' from "/> to and it works. Cheers, Ramadhany From: M S Ram To: nutch-user@lucene.apache.org Sent: Monday, January 19, 2009 5:26:00 PM Subject: Problem with Nutch on Eclipse & NetBeans Hi, In the search.jsp file, there is a line as follows: "/> When I tried to invoke this from the client by submitting a search query in the Nutch search interface, I see the following error: org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value language + "/include/header.html" is quoted with " which must be escaped when used within the value. And when I tried to escape the qutoes within quote as follows, "/> Eclipse is complaining saying "Syntax error on token "Invalid Character", ) expected". NetBeans, for the same thing says "Illegal character 92". Please help me reslove this problem. Thank you, Ram.
Re: Problem with Nutch on Eclipse & NetBeans
Replace outer quotes with single one. Alex 2009/1/19 M S Ram > Hi, > > In the search.jsp file, there is a line as follows: > > "/> > > When I tried to invoke this from the client by submitting a search query in > the Nutch search interface, I see the following error: > > org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value > language + "/include/header.html" is quoted with " which must be escaped > when used within the value. > > > And when I tried to escape the qutoes within quote as follows, > > "/> > > Eclipse is complaining saying "Syntax error on token "Invalid Character", ) > expected". > NetBeans, for the same thing says "Illegal character 92". > > Please help me reslove this problem. > > Thank you, > Ram. > -- Best Regards Alexander Aristov
Problem with Nutch on Eclipse & NetBeans
Hi, In the search.jsp file, there is a line as follows: "/> When I tried to invoke this from the client by submitting a search query in the Nutch search interface, I see the following error: org.apache.jasper.JasperException: /search.jsp(151,22) Attribute value language + "/include/header.html" is quoted with " which must be escaped when used within the value. And when I tried to escape the qutoes within quote as follows, "/> Eclipse is complaining saying "Syntax error on token "Invalid Character", ) expected". NetBeans, for the same thing says "Illegal character 92". Please help me reslove this problem. Thank you, Ram.
login failedd exception
Hi, I have configured the latest nutch from the nightly build in eclipse. I am getting this following error. Exception in thread "main" java.io.IOException: Failed to get the current user's information. at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717) at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions( JobClient.java:592) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:774 ) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1127) at org.apache.nutch.crawl.Injector.inject(Injector.java:160) at org.apache.nutch.crawl.Crawl.main(Crawl.java:112) Caused by: javax.security.auth.login.LoginException: Login failed: Cannot run program "whoami": CreateProcess error=2, The system cannot find the file specified at org.apache.hadoop.security.UnixUserGroupInformation.login( UnixUserGroupInformation.java:250) at org.apache.hadoop.security.UnixUserGroupInformation.login( UnixUserGroupInformation.java:275) at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715) ... 5 more Is there any way to overcome this. Regards, Vimal Varghese =-=-= Notice: The information contained in this e-mail message and/or attachments to it may contain confidential or privileged information. If you are not the intended recipient, any dissemination, use, review, distribution, printing or copying of the information contained in this e-mail message and/or attachments to it are strictly prohibited. If you have received this communication in error, please notify us by reply e-mail or telephone and immediately and permanently delete the message and any attachments. Thank you
Re: Nutch Training Seminar
Yes, No issues Please advise Thanks --- On Mon, 1/19/09, Lukáš Vlček wrote: From: Lukáš Vlček Subject: Re: Nutch Training Seminar To: nutch-user@lucene.apache.org Date: Monday, January 19, 2009, 7:16 AM Hi, Did you already decide how you are going to do the training from the technology point of view? If it si going to be just online live streaming will there be a chance (will we bw allowed) to record it onto local HD for later personal reitaration? Regards, Lukas On Mon, Jan 19, 2009 at 5:22 AM, Girish Redekar wrote: > Hi Dennis - > > Not sure if I'm too late, but I'm extremely to join the seminar too. I'm > particularly interested in understanding how to customize the Nutch > scoring. > > Apologies upfront for a naive doubt - how/where/where would such a seminar > be held (this is my first day with this mailing list). > > Thanks, > Girish > -- http://blog.lukas-vlcek.com/