Fwd: Reviving Nutch 0.7
-- Forwarded message -- From: Zaheed Haque [EMAIL PROTECTED] Date: Jan 22, 2007 10:13 AM Subject: Re: Reviving Nutch 0.7 To: nutch-dev@lucene.apache.org On 1/22/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today. However, I think there is still a need for something much simpler, something like what Nutch 0.7 used to be. Fairly regular nutch-user inquiries confirm this. Nutch has too few developers to maintain and further develop both of these concepts, and the main Nutch developers need the more powerful version - 0.8 and beyond. So, what is going to happen to 0.7? Maintenance mode? I feel that there is enough need for 0.7-style Nutch that it might be worth at least considering and discussing the possibility of somehow branching that version into a parallel project that's not just in a maintenance mode, but has its own group of developers (not me, no time :( ) that pushes it forward. Thoughts? I agree with you that there is a need for 0.7-style Nutch. I wouldn't say reviving but more Disecting and re-directing :-). here you go --- my focus here is 0.7 style i.e. mid-size, enterprise need. Solr could use a good crawler cos it has everything else .. (AFAIK) probably this is not technically plug an pray :-) also I am not sure Solr community wants a crawler but it could benefit from such Solr add on/snap on crawler. Furthermore I am sure some of the 0.7 plugins could be re-factored to fit into Solr. I will forward the mail to Solr community to see if there any interest. Cheers
[jira] Created: (SOLR-118) Some admin pages stop working with error 404 as the only symptom
Some admin pages stop working with error 404 as the only symptom -- Key: SOLR-118 URL: https://issues.apache.org/jira/browse/SOLR-118 Project: Solr Issue Type: Bug Components: web gui Environment: Fedora Core 4 (Linux version 2.6.11-1.1369_FC4smp) Sun's JVM 1.5.0_07-b03 Reporter: Bertrand Delacretaz Priority: Minor This was reported to the mailing list a while ago, see http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200610.mbox/[EMAIL PROTECTED] Today I'm seeing the same thing on a Solr instance that has been running since January 9th (about 13 days) with the plain start.jar setup. Index contains 150'000 docs, 88322 search requests to date. $ curl http://localhost:8983/solr/admin/analysis.jsp html head titleError 404 /admin/analysis.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/analysis.jsp/pre pRequestURI=/solr/admin/analysis.jsp/p ... curl http://localhost:8983/solr/admin/index.jsp html head titleError 404 /admin/index.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/index.jsp/pre pRequestURI=/solr/admin/index.jsp/p ... Other admin pages work correctly, for example http://localhost:8983/solr/admin/stats.jsp I don't see any messages in the logs, which are capturing stdout and stderr from the JVM. I guess I'll have to restart this instance, I'm out of possibilities to find out what's happening exactly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: facet response
On Jan 21, 2007, at 11:54 PM, Yonik Seeley wrote: On 1/21/07, Erik Hatcher [EMAIL PROTECTED] wrote: In the built-in simple faceting, I get a Ruby response like this: 'facet_counts'={ 'facet_queries'={}, 'facet_fields'={ 'subject_genre_facet'={ 'Biography.'=2605, 'Congresses.'=1837, 'Bibliography.'=672, 'Exhibitions.'=642, 'Periodicals.'=615}, ... This is using facet.limit=5 and no sort specified, so the items are being written in the proper order, however they are written as a Ruby Hash syntax, which does not iterate in a predictable order (like Java's Map). This really should be an Array in order for the client to assume the response is in a specified order. I think the response is best formatted as: 'facet_counts'={ 'facet_queries'={}, 'facet_fields'={ 'subject_genre_facet'=[ {'Biography.'=2605}, {'Congresses.'=1837}, {'Bibliography.'=672}, {'Exhibitions.'=642}, {'Periodicals.'=615}], ... This makes the navigation of the results a bit clunkier because each item in a fields array is a single element Hash(Map), but the facets of a field really need to be in an array to maintain order. I presume this same dilemma is in the Python/JSON format too? In XML, the lst has the right semantics, and a parser would easily be able to deal with it in order: lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=subject_genre_facet int name=Biography.2605/int int name=Congresses.1837/int int name=Bibliography.672/int int name=Exhibitions.642/int int name=Periodicals.615/int /lst ... Thoughts? But isn't this considered a bug in the data structure used to write out the facets? Options: - Resort yourself (for this specific case), it's probably going to be faster than having eval() create a new hash object for each anyway. For sure it'd be no problem to re-sort on the client. I was more concerned about the purity of the response and it losing its semantics as an ordered list. - Is there any place to hook into ruby's parser and create a map that preserves it's order? I would assume not. Oh, I'm sure there is a way to accomplish that sort of trickery. Ruby is hyper dynamic, so I'm quite confident that with a little voodoo this could be done. But its the wrong way to approach this - Does ruby have a JSON parser that preserves the order? Sure, even in one line of code :) http://rubyforge.org/snippet/detail.php?type=snippetid=29 But even more robustly this looks like the one to use: http:// json.rubyforge.org/ - A way to specify a different structure for only certain elements... Again, shouldn't an array be used in this context anyway, rather than a hash, regardless of which response writer is being used? If that's your *only* sorting problem, find a ruby implementation of a hash or a map that preserves order, then re-sort the hash, and replace it with the order-preserving map, and then worry about a more general solution later? Maybe what we need is a YAML response writer, and used an ordered map: http://yaml.org/type/omap.html Erik
[jira] Commented: (SOLR-118) Some admin pages stop working with error 404 as the only symptom
[ https://issues.apache.org/jira/browse/SOLR-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466468 ] Bertrand Delacretaz commented on SOLR-118: -- Yes, this is the Jetty that is bundled with Solr, Jetty/5.1.11RC0 according to the Server HTTP header. I haven't investigated on the Jetty side yet, it might be a known bug there Some admin pages stop working with error 404 as the only symptom -- Key: SOLR-118 URL: https://issues.apache.org/jira/browse/SOLR-118 Project: Solr Issue Type: Bug Components: web gui Environment: Fedora Core 4 (Linux version 2.6.11-1.1369_FC4smp) Sun's JVM 1.5.0_07-b03 Reporter: Bertrand Delacretaz Priority: Minor This was reported to the mailing list a while ago, see http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200610.mbox/[EMAIL PROTECTED] Today I'm seeing the same thing on a Solr instance that has been running since January 9th (about 13 days) with the plain start.jar setup. Index contains 150'000 docs, 88322 search requests to date. $ curl http://localhost:8983/solr/admin/analysis.jsp html head titleError 404 /admin/analysis.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/analysis.jsp/pre pRequestURI=/solr/admin/analysis.jsp/p ... curl http://localhost:8983/solr/admin/index.jsp html head titleError 404 /admin/index.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/index.jsp/pre pRequestURI=/solr/admin/index.jsp/p ... Other admin pages work correctly, for example http://localhost:8983/solr/admin/stats.jsp I don't see any messages in the logs, which are capturing stdout and stderr from the JVM. I guess I'll have to restart this instance, I'm out of possibilities to find out what's happening exactly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-118) Some admin pages stop working with error 404 as the only symptom
[ https://issues.apache.org/jira/browse/SOLR-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466464 ] Yonik Seeley commented on SOLR-118: --- What version of jetty was it? The one included with Solr? I don't personally have experience with Solr + Jetty and long uptimes. We use Resin in-house, and don't have any uptime issues. Some admin pages stop working with error 404 as the only symptom -- Key: SOLR-118 URL: https://issues.apache.org/jira/browse/SOLR-118 Project: Solr Issue Type: Bug Components: web gui Environment: Fedora Core 4 (Linux version 2.6.11-1.1369_FC4smp) Sun's JVM 1.5.0_07-b03 Reporter: Bertrand Delacretaz Priority: Minor This was reported to the mailing list a while ago, see http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200610.mbox/[EMAIL PROTECTED] Today I'm seeing the same thing on a Solr instance that has been running since January 9th (about 13 days) with the plain start.jar setup. Index contains 150'000 docs, 88322 search requests to date. $ curl http://localhost:8983/solr/admin/analysis.jsp html head titleError 404 /admin/analysis.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/analysis.jsp/pre pRequestURI=/solr/admin/analysis.jsp/p ... curl http://localhost:8983/solr/admin/index.jsp html head titleError 404 /admin/index.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/index.jsp/pre pRequestURI=/solr/admin/index.jsp/p ... Other admin pages work correctly, for example http://localhost:8983/solr/admin/stats.jsp I don't see any messages in the logs, which are capturing stdout and stderr from the JVM. I guess I'll have to restart this instance, I'm out of possibilities to find out what's happening exactly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: facet response
On 1/22/07, Erik Hatcher [EMAIL PROTECTED] wrote: But isn't this considered a bug in the data structure used to write out the facets? Not for JSON I think... that's a wire format, and the order is what you see on the wire. It can be a problem preserving order, depending on the client. The problem is, for a large number of cases, the order doesn't matter, even where we use a named list. If you translate all named lists into arrays of arrays or arrays of maps, it unnecessarily bloats the XML, and makes the JSON much harder to read. For example, the top level NamedList is ordered (responseHeader comes first), but in Ruby/Python we normally don't care since we can access by element name. Options: - Resort yourself (for this specific case), it's probably going to be faster than having eval() create a new hash object for each anyway. For sure it'd be no problem to re-sort on the client. I was more concerned about the purity of the response and it losing its semantics as an ordered list. - Is there any place to hook into ruby's parser and create a map that preserves it's order? I would assume not. Oh, I'm sure there is a way to accomplish that sort of trickery. Ruby is hyper dynamic, so I'm quite confident that with a little voodoo this could be done. But its the wrong way to approach this - Does ruby have a JSON parser that preserves the order? Sure, even in one line of code :) http://rubyforge.org/snippet/detail.php?type=snippetid=29 But even more robustly this looks like the one to use: http:// json.rubyforge.org/ - A way to specify a different structure for only certain elements... Again, shouldn't an array be used in this context anyway, rather than a hash, regardless of which response writer is being used? A NamedList is used... that is ordered for XML. It seems like what would be ideal from a user perspective would be to have an ordered map so you get random access lookup. An efficient representation using arrays would be two separate arrays: one for terms, one for counts. terms=[foo,bar,baz] counts=[30,20,10] But you loose easy random access lookup, and for a sufficiently large list, you loose the ability of a human to look at the raw response and correlate a count with the term. So what about something that could output something like omap(term1,100,term2, 45) The other alternative (besides changing *every* named list), is to have a facet.format and override the default structure. -Yonik If that's your *only* sorting problem, find a ruby implementation of a hash or a map that preserves order, then re-sort the hash, and replace it with the order-preserving map, and then worry about a more general solution later? Maybe what we need is a YAML response writer, and used an ordered map: http://yaml.org/type/omap.html Erik
[jira] Commented: (SOLR-118) Some admin pages stop working with error 404 as the only symptom
[ https://issues.apache.org/jira/browse/SOLR-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466489 ] Yonik Seeley commented on SOLR-118: --- Maybe it's time to upgrade to the latest Jetty, or at least start evaluating it? That would also remove the requirement for a JDK over a JRE, and speed up JSP page compilation too. Some admin pages stop working with error 404 as the only symptom -- Key: SOLR-118 URL: https://issues.apache.org/jira/browse/SOLR-118 Project: Solr Issue Type: Bug Components: web gui Environment: Fedora Core 4 (Linux version 2.6.11-1.1369_FC4smp) Sun's JVM 1.5.0_07-b03 Reporter: Bertrand Delacretaz Priority: Minor This was reported to the mailing list a while ago, see http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200610.mbox/[EMAIL PROTECTED] Today I'm seeing the same thing on a Solr instance that has been running since January 9th (about 13 days) with the plain start.jar setup. Index contains 150'000 docs, 88322 search requests to date. $ curl http://localhost:8983/solr/admin/analysis.jsp html head titleError 404 /admin/analysis.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/analysis.jsp/pre pRequestURI=/solr/admin/analysis.jsp/p ... curl http://localhost:8983/solr/admin/index.jsp html head titleError 404 /admin/index.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/index.jsp/pre pRequestURI=/solr/admin/index.jsp/p ... Other admin pages work correctly, for example http://localhost:8983/solr/admin/stats.jsp I don't see any messages in the logs, which are capturing stdout and stderr from the JVM. I guess I'll have to restart this instance, I'm out of possibilities to find out what's happening exactly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-118) Some admin pages stop working with error 404 as the only symptom
[ https://issues.apache.org/jira/browse/SOLR-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466496 ] Bertrand Delacretaz commented on SOLR-118: -- Upgrading is probably a good idea, at least to a released 5.x version, as apparently we're using a release candidate. Some admin pages stop working with error 404 as the only symptom -- Key: SOLR-118 URL: https://issues.apache.org/jira/browse/SOLR-118 Project: Solr Issue Type: Bug Components: web gui Environment: Fedora Core 4 (Linux version 2.6.11-1.1369_FC4smp) Sun's JVM 1.5.0_07-b03 Reporter: Bertrand Delacretaz Priority: Minor This was reported to the mailing list a while ago, see http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200610.mbox/[EMAIL PROTECTED] Today I'm seeing the same thing on a Solr instance that has been running since January 9th (about 13 days) with the plain start.jar setup. Index contains 150'000 docs, 88322 search requests to date. $ curl http://localhost:8983/solr/admin/analysis.jsp html head titleError 404 /admin/analysis.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/analysis.jsp/pre pRequestURI=/solr/admin/analysis.jsp/p ... curl http://localhost:8983/solr/admin/index.jsp html head titleError 404 /admin/index.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/index.jsp/pre pRequestURI=/solr/admin/index.jsp/p ... Other admin pages work correctly, for example http://localhost:8983/solr/admin/stats.jsp I don't see any messages in the logs, which are capturing stdout and stderr from the JVM. I guess I'll have to restart this instance, I'm out of possibilities to find out what's happening exactly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Fwd: Reviving Nutch 0.7
On Mon, 2007-01-22 at 10:13 +0100, Zaheed Haque wrote: -- Forwarded message -- From: Zaheed Haque [EMAIL PROTECTED] Date: Jan 22, 2007 10:13 AM Subject: Re: Reviving Nutch 0.7 To: nutch-dev@lucene.apache.org On 1/22/07, Otis Gospodnetic [EMAIL PROTECTED] wrote: Hi, I've been meaning to write this message for a while, and Andrzej's StrategicGoals made me compose it, finally. Nutch 0.8 and beyond is very cool, very powerful, and once Hadoop stabilizes, it will be even more valuable than it is today. However, I think there is still a need for something much simpler, something like what Nutch 0.7 used to be. Fairly regular nutch-user inquiries confirm this. Nutch has too few developers to maintain and further develop both of these concepts, and the main Nutch developers need the more powerful version - 0.8 and beyond. So, what is going to happen to 0.7? Maintenance mode? I feel that there is enough need for 0.7-style Nutch that it might be worth at least considering and discussing the possibility of somehow branching that version into a parallel project that's not just in a maintenance mode, but has its own group of developers (not me, no time :( ) that pushes it forward. Thoughts? I do not really want to comment on the 0.7 part of this discussion. I agree with you that there is a need for 0.7-style Nutch. I wouldn't say reviving but more Disecting and re-directing :-). here you go --- my focus here is 0.7 style i.e. mid-size, enterprise need. Solr could use a good crawler cos it has everything else .. (AFAIK) probably this is not technically plug an pray :-) also I am not sure Solr community wants a crawler but it could benefit from such Solr add on/snap on crawler. I used forrest/cocoon cli as crawler in a forrest plugin I wrote. I will need to look into the nutch crawler code to see whether we could reuse this code. Not sure how close this is married with the db but I guess pretty close. Furthermore I am sure some of the 0.7 plugins could be re-factored to fit into Solr. The thing about introducing all this plugin into solr we may come pretty soon into the situation the original thread is describing. We may blow the simple one thing that we want to solve to a well defined problem with too much plugins and components. I like to have solr tools that are doing some well defined processes like updating the solr server with crawled content but like said they are IMO tools not really part of solr core. In the end if you want an enhanced search experience via solr with all the filter goodies then you need to add more fields then the once from the e.g. nutch standard xhtml parser. Certain documents allow fine filtering based on additional information this documents may provide (year, type, organization, author, etc.). It is easy to write a single component to update a certain doc type or set of information against solr, but IMO that should not be the focus of main solr development. I think that should go into a tools/ dir. I will forward the mail to Solr community to see if there any interest. Thanks Zaheed. Fits good into the Update Plugins thread. salu2 Cheers -- thorsten Together we stand, divided we fall! Hey you (Pink Floyd)
Re: Update Plugins (was Re: Handling disparate data sources in Solr)
:3) there's a comment in RequestHandlerBase.init about indexOf that : comes form the existing impl in DismaxRequestHandler -- but doesn't match : the new code ... i also wasn't certain that the change you made matches : I just copied the code from DismaxRequestHandler and made sure it : passes the tests. I don't totally understand what that case is doing. : : The first iteration of dismax (before we did generic defaults, : invariants, etc for request handlers) took defaults directly from the : init params, and that is what that case is checking for and bingo .. the reason it jumped out at me in your patch, is that the comment still refered to indexOf, but the code didn't ... it might be functionally equivilent, i just wasn't sure when i did my quick read. there's mention in the comment that indexOf is used so that null name=defaults / can indicate that you don't want all the init params as defaults, but you don't acctually want defaults either -- but there doesn't seem to be a test for that case. you can see support for the legacy defaults syntax in src/test/test-files/solr/conf/solrconfig.xml if you grep for dismaxOldStyleDefaults -Hoss
Re: continuous integration for solrb
: While we are on the subject of continuous integration, does anyone think that : we should also do so for Lucene and Solr? Doing so we give us a heads-up if : changes in Lucene breaks Solr. the current nightly solr builds may not be continuous but they are rgular, and they will fail and complain if there are any build/test failures in Solr. rigging up a seperate recuring build that allways uses the latest lucene nightly build is an interesting idea ... but if Lucene starts doing more frequently releases and treats the trunk as more unstable (which is the direction it seems to be heading) this may not be that useful. Of course: if/when that happens, Solr will probably want to stop using nightly builds of lucene anyway, and only rev when their official point releases. : : Bill : : On 1/22/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote: : On 1/22/07, Erik Hatcher [EMAIL PROTECTED] wrote: : : ...I don't know much about our Solaris zone, so could someone fill me in : on it a bit?... : : I haven't seen Solr's zone yet, but basically zones are Solaris : (virtual) machines where some of us can get root access, so we can : install anything there as long as it plays nice with other zones in : terms of CPU and memory usage. Currently all of the ASF's zones are : sharing a - fairly powerful - physical machine. : : For example, the Cocoon zone at http://cocoon.zones.apache.org/ runs : live demos of Cocoon pulled automatically out of SVN every few hours : by crontab scripts, the Continuum continuous integration server, and : the Daisy CMS for editing docs. : : There's more info at http://www.apache.org/dev/solaris-zones.html : : HTH, : -Bertrand : : -Hoss
Re: facet response
On 1/22/07, Yonik Seeley [EMAIL PROTECTED] wrote: An efficient representation using arrays would be two separate arrays: one for terms, one for counts. terms=[foo,bar,baz] counts=[30,20,10] But you loose easy random access lookup, and for a sufficiently large list, you loose the ability of a human to look at the raw response and correlate a count with the term. Or if you want to retain the human readability, a single array: [foo,30,bar,20,baz,10] We could introduce some new types to tell the output handlers just how important it is to maintain order, so all named lists don't get treated the same. Example: class OrderedNamedList extends NamedList {...} Using OrderedNamedList, means that it's really important that order be maintained, and we could a different strategy such as interleaving keys and values in a single array (or another strategy set by json.orderednl?) Thoughts? -Yonik So what about something that could output something like omap(term1,100,term2, 45) The other alternative (besides changing *every* named list), is to have a facet.format and override the default structure. -Yonik If that's your *only* sorting problem, find a ruby implementation of a hash or a map that preserves order, then re-sort the hash, and replace it with the order-preserving map, and then worry about a more general solution later? Maybe what we need is a YAML response writer, and used an ordered map: http://yaml.org/type/omap.html Erik
Re: continuous integration for solrb
On 1/22/07, Chris Hostetter [EMAIL PROTECTED] wrote: Of course: if/when that happens, Solr will probably want to stop using nightly builds of lucene anyway, and only rev when their official point releases. I'd rather play that one by ear... I haven't rev'd the lucene version recently because of all the file format changes going on. At this point, I would feel a little more comfortable with waiting until the next release, or near to it. -Yonik
[jira] Commented: (SOLR-118) Some admin pages stop working with error 404 as the only symptom
[ https://issues.apache.org/jira/browse/SOLR-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466535 ] Hoss Man commented on SOLR-118: --- FYI: there was more tothat orriginal thread then the apache archives show (because they are split up by month) here's the full discussion... http://www.nabble.com/Admin-page-went-down-tf2548760.html#a7103716 ...at the time i wasn't able to reproduce the problem, but i wasn't hammering the port very hard. I suspect heavily that since hte problem was only with the admin pages, and all of the update/query functionality still worked fine that it was a JSP issue with Jetty. Some admin pages stop working with error 404 as the only symptom -- Key: SOLR-118 URL: https://issues.apache.org/jira/browse/SOLR-118 Project: Solr Issue Type: Bug Components: web gui Environment: Fedora Core 4 (Linux version 2.6.11-1.1369_FC4smp) Sun's JVM 1.5.0_07-b03 Reporter: Bertrand Delacretaz Priority: Minor This was reported to the mailing list a while ago, see http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200610.mbox/[EMAIL PROTECTED] Today I'm seeing the same thing on a Solr instance that has been running since January 9th (about 13 days) with the plain start.jar setup. Index contains 150'000 docs, 88322 search requests to date. $ curl http://localhost:8983/solr/admin/analysis.jsp html head titleError 404 /admin/analysis.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/analysis.jsp/pre pRequestURI=/solr/admin/analysis.jsp/p ... curl http://localhost:8983/solr/admin/index.jsp html head titleError 404 /admin/index.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/index.jsp/pre pRequestURI=/solr/admin/index.jsp/p ... Other admin pages work correctly, for example http://localhost:8983/solr/admin/stats.jsp I don't see any messages in the logs, which are capturing stdout and stderr from the JVM. I guess I'll have to restart this instance, I'm out of possibilities to find out what's happening exactly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (SOLR-118) Some admin pages stop working with error 404 as the only symptom
[ https://issues.apache.org/jira/browse/SOLR-118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466539 ] Bertrand Delacretaz commented on SOLR-118: -- I suspect that it was a JSP issue with Jetty. Yes, certainly. Nothing seems to indicate a problem in Solr's code. Some admin pages stop working with error 404 as the only symptom -- Key: SOLR-118 URL: https://issues.apache.org/jira/browse/SOLR-118 Project: Solr Issue Type: Bug Components: web gui Environment: Fedora Core 4 (Linux version 2.6.11-1.1369_FC4smp) Sun's JVM 1.5.0_07-b03 Reporter: Bertrand Delacretaz Priority: Minor This was reported to the mailing list a while ago, see http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200610.mbox/[EMAIL PROTECTED] Today I'm seeing the same thing on a Solr instance that has been running since January 9th (about 13 days) with the plain start.jar setup. Index contains 150'000 docs, 88322 search requests to date. $ curl http://localhost:8983/solr/admin/analysis.jsp html head titleError 404 /admin/analysis.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/analysis.jsp/pre pRequestURI=/solr/admin/analysis.jsp/p ... curl http://localhost:8983/solr/admin/index.jsp html head titleError 404 /admin/index.jsp/title /head body h2HTTP ERROR: 404/h2pre/admin/index.jsp/pre pRequestURI=/solr/admin/index.jsp/p ... Other admin pages work correctly, for example http://localhost:8983/solr/admin/stats.jsp I don't see any messages in the logs, which are capturing stdout and stderr from the JVM. I guess I'll have to restart this instance, I'm out of possibilities to find out what's happening exactly. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: continuous integration for solrb
On Jan 22, 2007, at 1:44 AM, Bertrand Delacretaz wrote: There's more info at http://www.apache.org/dev/solaris-zones.html Anyone here capable of giving me an account on the Lucene zone to let me tinker a bit? Or is Doug the right person to set me up? Thanks, Erik
Re: facet response
For the record, I predicted this problem would come up... http://www.nabble.com/JSON-output-support-tf1915406.html#a5247622 : We could introduce some new types to tell the output handlers just how : important it is to maintain order, so all named lists don't get : treated the same. : : Example: : class OrderedNamedList extends NamedList {...} Ugh ... please no ... the List in NamedList is what indicates that order is a factor (it's what distinguishes a NamedList from a hypothetical MultiValueMap) : Using OrderedNamedList, means that it's really important that order be : maintained, and we could a different strategy such as interleaving : keys and values in a single array (or another strategy set by : json.orderednl?) i would much rather see us change any places in the Solr request handlers where order does *not* matter to just use a Map ...then the ResponseWriter could know that order in Maps doesn't matter, but order in NamedLists do. -Hoss
Re: facet response
On 1/22/07, Chris Hostetter [EMAIL PROTECTED] wrote: For the record, I predicted this problem would come up... http://www.nabble.com/JSON-output-support-tf1915406.html#a5247622 : We could introduce some new types to tell the output handlers just how : important it is to maintain order, so all named lists don't get : treated the same. : : Example: : class OrderedNamedList extends NamedList {...} Ugh ... please no ... the List in NamedList is what indicates that order is a factor (it's what distinguishes a NamedList from a hypothetical MultiValueMap) But we also use it when order doesn't totally matter, but it's still nice. : Using OrderedNamedList, means that it's really important that order be : maintained, and we could a different strategy such as interleaving : keys and values in a single array (or another strategy set by : json.orderednl?) i would much rather see us change any places in the Solr request handlers where order does *not* matter to just use a Map ...then the ResponseWriter could know that order in Maps doesn't matter, but order in NamedLists do. The problem is that it's not clear cut. Should the top level (containing responseHeader, result, facets, etc) be ordered or unordered? We currently order pretty much everything, even when it's slightly redundant (we highlight docs in order, but we also include a unique id). Even when order doesn't strictly matter, it's still nice to see things ordered in the response (the responseHeader first for instance). So what's your proposal for what a facet list should look like in JSON? -Yonik
[jira] Resolved: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-80. -- Resolution: Fixed committed. Thanks for the review Mike! negative filter queries --- Key: SOLR-80 URL: https://issues.apache.org/jira/browse/SOLR-80 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: negative_filters.patch, negative_filters.patch There is a need for negative filter queries to avoid long filter generation times and large caching requirements. Currently, if someone wants to filter out a small number of documents, they must specify the complete set of documents to express those negative conditions against. q=foofq=id:[* TO *] -id:101 In this example, to filter out a single document, the complete set of documents (minus one) is generated, and a large bitset is cached. You could also add the restriction to the main query, but that doesn't work with the dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: facet response
Chris Hostetter [EMAIL PROTECTED] wrote: as i said, i'd rather invert the use case set to find where ordering isn't important and change those to Maps That might be a *lot* of changes... What's currently broken, just faceting or anything else? -Yonik
Re: graduation todo list
Here is a *trivial* one: the 'Documentation' link on src/webapp/resources/admin/index.jsp still points to: http://incubator.apache.org/solr/
Re: [jira] Resolved: (SOLR-80) negative filter queries
On Jan 22, 2007, at 4:43 PM, Yonik Seeley (JIRA) wrote: Yonik Seeley resolved SOLR-80. -- Resolution: Fixed committed. Thanks for the review Mike! You guys are quick! I had on my TODO list to review this patch tonight. :)
Re: graduation todo list
Committed, thanks! Erik On Jan 22, 2007, at 7:11 PM, Ryan McKinley wrote: Here is a *trivial* one: the 'Documentation' link on src/webapp/resources/admin/index.jsp still points to: http://incubator.apache.org/solr/
[jira] Commented: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466611 ] Yonik Seeley commented on SOLR-80: -- You are seeing a MatchAllDocsQuery filter. If getDocSet(ListQuery) is called with a single negative query, or or getDocSet(Query, Filter) is called with a null filter and a negative query, we call getDocSet(MatchAllDocsQuery) to use as a base to andNot the passed query. If you continue your example with fq=+memory and fq=-memory, you will see what you expect (only one new filter). negative filter queries --- Key: SOLR-80 URL: https://issues.apache.org/jira/browse/SOLR-80 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: negative_filters.patch, negative_filters.patch There is a need for negative filter queries to avoid long filter generation times and large caching requirements. Currently, if someone wants to filter out a small number of documents, they must specify the complete set of documents to express those negative conditions against. q=foofq=id:[* TO *] -id:101 In this example, to filter out a single document, the complete set of documents (minus one) is generated, and a large bitset is cached. You could also add the restriction to the main query, but that doesn't work with the dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466612 ] Mike Klaas commented on SOLR-80: I think this is due to the last line of this fragment of the patch: protected DocSet getDocSet(ListQuery queries) throws IOException { +if (queries==null) return null; +if (queries.size()==1) return getDocSet(queries.get(0)); DocSet answer=null; -if (queries==null) return null; -for (Query q : queries) { - if (answer==null) { -answer = getDocSet(q); + +boolean[] neg = new boolean[queries.size()]; +DocSet[] sets = new DocSet[queries.size()]; + +int smallestIndex = -1; +int smallestCount = Integer.MAX_VALUE; +for (int i=0; isets.length; i++) { + Query q = queries.get(i); + Query posQuery = QueryUtils.getAbs(q); + sets[i] = getPositiveDocSet(posQuery); getPositiveDocSet() caches all docsets returned, so both the query part and the filter part would be cached in the filterCache. negative filter queries --- Key: SOLR-80 URL: https://issues.apache.org/jira/browse/SOLR-80 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: negative_filters.patch, negative_filters.patch There is a need for negative filter queries to avoid long filter generation times and large caching requirements. Currently, if someone wants to filter out a small number of documents, they must specify the complete set of documents to express those negative conditions against. q=foofq=id:[* TO *] -id:101 In this example, to filter out a single document, the complete set of documents (minus one) is generated, and a large bitset is cached. You could also add the restriction to the main query, but that doesn't work with the dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466616 ] Hoss Man commented on SOLR-80: -- I was strating to think the same thing as Mike: but doing more testing i seee what yonik's refering to (note to self: test more then one query when doing cache testing) ... only the first use of a negative query results in the double insert .. afterthat every thing is golden. Mike: i think the key is that unless faceting is turned on, the StandardRequestHandler only calls getDocList, not getDocListAndSet ... so by the time the call makes it to getDocListC the falgs never contain GET_DOCSET, so the main query isn't included in the list passed to getDocSet. negative filter queries --- Key: SOLR-80 URL: https://issues.apache.org/jira/browse/SOLR-80 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: negative_filters.patch, negative_filters.patch There is a need for negative filter queries to avoid long filter generation times and large caching requirements. Currently, if someone wants to filter out a small number of documents, they must specify the complete set of documents to express those negative conditions against. q=foofq=id:[* TO *] -id:101 In this example, to filter out a single document, the complete set of documents (minus one) is generated, and a large bitset is cached. You could also add the restriction to the main query, but that doesn't work with the dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466617 ] Mike Klaas commented on SOLR-80: Surely Hoss' example doesn't use matchAllDocs--he has a positive query in both cases. negative filter queries --- Key: SOLR-80 URL: https://issues.apache.org/jira/browse/SOLR-80 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: negative_filters.patch, negative_filters.patch There is a need for negative filter queries to avoid long filter generation times and large caching requirements. Currently, if someone wants to filter out a small number of documents, they must specify the complete set of documents to express those negative conditions against. q=foofq=id:[* TO *] -id:101 In this example, to filter out a single document, the complete set of documents (minus one) is generated, and a large bitset is cached. You could also add the restriction to the main query, but that doesn't work with the dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-80) negative filter queries
: Surely Hoss' example doesn't use matchAllDocs--he has a positive query in both cases. no acctually i was testing out a positive filter and then the negative of that filter and thought i was seeing cache inserts for both. what i was realy seeing was a cache insert of the positive and a cache insert of the matchalldocs. -Hoss
[jira] Commented: (SOLR-80) negative filter queries
[ https://issues.apache.org/jira/browse/SOLR-80?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12466621 ] Mike Klaas commented on SOLR-80: Hoss: thanks for the explanation. I might throw this in our production code this week and see how it fares. negative filter queries --- Key: SOLR-80 URL: https://issues.apache.org/jira/browse/SOLR-80 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: negative_filters.patch, negative_filters.patch There is a need for negative filter queries to avoid long filter generation times and large caching requirements. Currently, if someone wants to filter out a small number of documents, they must specify the complete set of documents to express those negative conditions against. q=foofq=id:[* TO *] -id:101 In this example, to filter out a single document, the complete set of documents (minus one) is generated, and a large bitset is cached. You could also add the restriction to the main query, but that doesn't work with the dismax handler which doesn't have a facility for this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: facet.missing?
: So now that we have negative queries, we don't really need any : additional/extra code for facet.missing. It could simply be : facet.query=-myfield:*, and that way it could be obtained without : getting facet.field results if desired. facet.missing can be used on a per field basis .. but i suspect a more natural usage of it is to just use facet.missing=true when i always want to show the user a count for resutls that don't match any value for each of my facets this... q=ipodfacet=truefacet.missing=truefacet.field=inStockfacet.field=catfacet.field=foo is nicer then... q=ipodfacet=truefacet.field=inStockfacet.field=catfacet.field=foofacet.query=-inStock:*facet.query=-cat:*facet.query=-foo:* ...particularly when you want to put str name=face.missingtrue/str as a default in your solrconfig. : Of course we would need to enable zero-length prefix queries in the : SolrQueryParser for that, but I think we should do that anyway. hmmm... is that really better then saying foo:[* TO *] ? ... i guess syntacticly it's nicer, but on the other hand making people spell out the range query forces them to conciously choose to do it ... much the same way the *:* syntax for MatchAllDocs works. acctually, that's makes me realize: if you support zero width prefix queries, then * is going to be parsed as a zero width prefix on whatever the defaultSearchField is and return all results which have a value in that field ... but that may confuse a lot of people who might assume it is giving them all docs in the index (and since they are going to get results instead of errors, they won't have any indication that they are wrong) : So should we deprecate facet.missing, or is it only really used with : facet.field queries, and often enough we would want it *in* that list? well yeah, there's that too ... if you are parsing the facet counts dealing with the missing count in the list for each facet field is easier then correlating back to a facet query -- which involves some anoying string manipulation. (where by easier and anoying i mean if i had to do this in XSLT how painful would it be?) -Hoss
Re: facet.missing?
On 1/22/07, Chris Hostetter [EMAIL PROTECTED] wrote: acctually, that's makes me realize: if you support zero width prefix queries, then * is going to be parsed as a zero width prefix on whatever the defaultSearchField is and return all results which have a value in that field Hmmm, right. if the QueryParser actually supports parsing that syntax. I haven't tried it out. It's just that it normally surprises people that they can do foo:a* and not foo:* -Yonik
[jira] Updated: (SOLR-117) constrain field faceting to a prefix
[ https://issues.apache.org/jira/browse/SOLR-117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley updated SOLR-117: -- Attachment: facet_prefix.patch Full patch w/ tests attached. This version also implements facet.prefix for the FieldCache method. It also lowers the memory used per-request for that method (because int[] count is smaller since we know the max number of terms beforehand that match the prefix). A binary search is used to find the start and end terms for the prefix. constrain field faceting to a prefix Key: SOLR-117 URL: https://issues.apache.org/jira/browse/SOLR-117 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: facet_prefix.patch, facet_prefix.patch Useful for faceting as someone is typing, autocompletion, etc -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.