[jira] Commented: (SOLR-1383) Replication causes master to fail to delete old index files

2009-08-28 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748682#action_12748682
 ] 

Lance Norskog commented on SOLR-1383:
-

I checked again- the files do not go away. Not after another commit, not after 
restarting solr.  

The replication commit reservation code definitely has a bug.

 Replication causes master to fail to delete old index files
 ---

 Key: SOLR-1383
 URL: https://issues.apache.org/jira/browse/SOLR-1383
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
 Environment: Linux CentOS - latest Solr 1.4 trunk - Java 1.6
Reporter: Lance Norskog
 Fix For: 1.4


 I have developed a way to make replication leave old index files in the 
 master's data/index directory. It is timing-dependent. A sequence of commands 
 runs correctly or fails, depending on the timing between the commands.
 Here is the test scenario:
 Start a master and slave version of the Solr distributed example. I used 8080 
 for the slave. (See example/etc/jetty.xml)
 Be sure to start with empty solr/data/index files on both master and slave.
 Open the replication administration jsp on the slave ( 
 http://localhost:8080/solr/admin/replication/index.jsp )
 Disable polling.
 In a text window, go to the example/exampledocs directory and run this script
 {code}
 for x in *.xml
 do
   echo $x
   sh post.sh $x
   sleep 15
   curl http://localhost:8080/solr/replication?command=fetchindex;
 done
 {code}
 This prints each example file, indexes it, and does a replication command. At 
 the end of this exercise, the master and slave solr/data/index files will be 
 identical.
 Now, kill master  slave, remove the solr/index/data directories, and start 
 over.  This time, remove the sleep command from the script. In my 
 environment, old Lucene index files were left in the master's data/index. 
 Here is what is left in the master data/index. 
  The segments_? files are random across runs, but the index files left over 
 are consistent.
 Note (courtesy of the Linux 'ls -l /proc/PID/fd' command) that the old files 
 are not kept open by the master solr; they are merely left behind.
 In the master server:
 {code}
 % ls solr/data/index
 _0.fdt  _1.prx  _2.tvx  _4.nrm  _5.tii  _7.frq  _8.tvd  _a.tvx  _c.nrm
 _0.fdx  _1.tii  _3.fdt  _4.prx  _5.tis  _7.nrm  _8.tvf  _b.fdt  _c.prx
 _0.fnm  _1.tis  _3.fdx  _4.tii  _6.fdt  _7.prx  _8.tvx  _b.fdx  _c.tii
 _0.frq  _2.fdt  _3.fnm  _4.tis  _6.fdx  _7.tii  _a.fdt  _b.fnm  _c.tis
 _0.nrm  _2.fdx  _3.frq  _4.tvd  _6.fnm  _7.tis  _a.fdx  _b.frq  segments.gen
 _0.prx  _2.fnm  _3.nrm  _4.tvf  _6.frq  _8.fdt  _a.fnm  _b.nrm  segments_8
 _0.tii  _2.frq  _3.prx  _4.tvx  _6.nrm  _8.fdx  _a.frq  _b.prx  segments_9
 _0.tis  _2.nrm  _3.tii  _5.fdt  _6.prx  _8.fnm  _a.nrm  _b.tii  segments_a
 _1.fdt  _2.prx  _3.tis  _5.fdx  _6.tii  _8.frq  _a.prx  _b.tis  segments_b
 _1.fdx  _2.tii  _4.fdt  _5.fnm  _6.tis  _8.nrm  _a.tii  _c.fdt  segments_c
 _1.fnm  _2.tis  _4.fdx  _5.frq  _7.fdt  _8.prx  _a.tis  _c.fdx  segments_d
 _1.frq  _2.tvd  _4.fnm  _5.nrm  _7.fdx  _8.tii  _a.tvd  _c.fnm
 _1.nrm  _2.tvf  _4.frq  _5.prx  _7.fnm  _8.tis  _a.tvf  _c.frq
 {code}
 {code}
 % ls -l /proc/PID/fd
 lr-x-- 1 root root 64 Aug 25 22:52 137 - 
 /index/master/solr/data/index/_a.tis
 lr-x-- 1 root root 64 Aug 25 22:52 138 - 
 /index/master/solr/data/index/_a.frq
 lr-x-- 1 root root 64 Aug 25 22:52 139 - 
 /index/master/solr/data/index/_a.prx
 lr-x-- 1 root root 64 Aug 25 22:52 140 - 
 /index/master/solr/data/index/_a.fdt
 lr-x-- 1 root root 64 Aug 25 22:52 141 - 
 /index/master/solr/data/index/_a.fdx
 lr-x-- 1 root root 64 Aug 25 22:52 142 - 
 /index/master/solr/data/index/_a.tvx
 lr-x-- 1 root root 64 Aug 25 22:52 143 - 
 /index/master/solr/data/index/_a.tvd
 lr-x-- 1 root root 64 Aug 25 22:52 144 - 
 /index/master/solr/data/index/_a.tvf
 lr-x-- 1 root root 64 Aug 25 22:52 145 - 
 /index/master/solr/data/index/_a.nrm
 lr-x-- 1 root root 64 Aug 25 22:52 72 - 
 /index/master/solr/data/index/_b.tis
 lr-x-- 1 root root 64 Aug 25 22:52 73 - 
 /index/master/solr/data/index/_b.frq
 lr-x-- 1 root root 64 Aug 25 22:52 74 - 
 /index/master/solr/data/index/_b.prx
 lr-x-- 1 root root 64 Aug 25 22:52 76 - 
 /index/master/solr/data/index/_b.fdt
 lr-x-- 1 root root 64 Aug 25 22:52 78 - 
 /index/master/solr/data/index/_b.fdx
 lr-x-- 1 root root 64 Aug 25 22:52 79 - 
 /index/master/solr/data/index/_b.nrm
 lr-x-- 1 root root 64 Aug 25 22:52 80 - 
 /index/master/solr/data/index/_c.tis
 lr-x-- 1 root root 64 Aug 25 22:52 81 - 
 /index/master/solr/data/index/_c.frq
 lr-x-- 1 root root 64 Aug 25 22:52 82 - 
 /index/master/solr/data/index/_c.prx
 lr-x-- 1 root root 64 Aug 25 22:52 83 - 
 

[jira] Commented: (SOLR-1383) Replication causes master to fail to delete old index files

2009-08-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748688#action_12748688
 ] 

Noble Paul commented on SOLR-1383:
--

bq.he files do not go away. Not after another commit, not after restarting solr.

All the old files are necessary for the index to work. The latest commit is not 
the only one that is used. You do an optimize and the old files will go away

 Replication causes master to fail to delete old index files
 ---

 Key: SOLR-1383
 URL: https://issues.apache.org/jira/browse/SOLR-1383
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
 Environment: Linux CentOS - latest Solr 1.4 trunk - Java 1.6
Reporter: Lance Norskog
 Fix For: 1.4


 I have developed a way to make replication leave old index files in the 
 master's data/index directory. It is timing-dependent. A sequence of commands 
 runs correctly or fails, depending on the timing between the commands.
 Here is the test scenario:
 Start a master and slave version of the Solr distributed example. I used 8080 
 for the slave. (See example/etc/jetty.xml)
 Be sure to start with empty solr/data/index files on both master and slave.
 Open the replication administration jsp on the slave ( 
 http://localhost:8080/solr/admin/replication/index.jsp )
 Disable polling.
 In a text window, go to the example/exampledocs directory and run this script
 {code}
 for x in *.xml
 do
   echo $x
   sh post.sh $x
   sleep 15
   curl http://localhost:8080/solr/replication?command=fetchindex;
 done
 {code}
 This prints each example file, indexes it, and does a replication command. At 
 the end of this exercise, the master and slave solr/data/index files will be 
 identical.
 Now, kill master  slave, remove the solr/index/data directories, and start 
 over.  This time, remove the sleep command from the script. In my 
 environment, old Lucene index files were left in the master's data/index. 
 Here is what is left in the master data/index. 
  The segments_? files are random across runs, but the index files left over 
 are consistent.
 Note (courtesy of the Linux 'ls -l /proc/PID/fd' command) that the old files 
 are not kept open by the master solr; they are merely left behind.
 In the master server:
 {code}
 % ls solr/data/index
 _0.fdt  _1.prx  _2.tvx  _4.nrm  _5.tii  _7.frq  _8.tvd  _a.tvx  _c.nrm
 _0.fdx  _1.tii  _3.fdt  _4.prx  _5.tis  _7.nrm  _8.tvf  _b.fdt  _c.prx
 _0.fnm  _1.tis  _3.fdx  _4.tii  _6.fdt  _7.prx  _8.tvx  _b.fdx  _c.tii
 _0.frq  _2.fdt  _3.fnm  _4.tis  _6.fdx  _7.tii  _a.fdt  _b.fnm  _c.tis
 _0.nrm  _2.fdx  _3.frq  _4.tvd  _6.fnm  _7.tis  _a.fdx  _b.frq  segments.gen
 _0.prx  _2.fnm  _3.nrm  _4.tvf  _6.frq  _8.fdt  _a.fnm  _b.nrm  segments_8
 _0.tii  _2.frq  _3.prx  _4.tvx  _6.nrm  _8.fdx  _a.frq  _b.prx  segments_9
 _0.tis  _2.nrm  _3.tii  _5.fdt  _6.prx  _8.fnm  _a.nrm  _b.tii  segments_a
 _1.fdt  _2.prx  _3.tis  _5.fdx  _6.tii  _8.frq  _a.prx  _b.tis  segments_b
 _1.fdx  _2.tii  _4.fdt  _5.fnm  _6.tis  _8.nrm  _a.tii  _c.fdt  segments_c
 _1.fnm  _2.tis  _4.fdx  _5.frq  _7.fdt  _8.prx  _a.tis  _c.fdx  segments_d
 _1.frq  _2.tvd  _4.fnm  _5.nrm  _7.fdx  _8.tii  _a.tvd  _c.fnm
 _1.nrm  _2.tvf  _4.frq  _5.prx  _7.fnm  _8.tis  _a.tvf  _c.frq
 {code}
 {code}
 % ls -l /proc/PID/fd
 lr-x-- 1 root root 64 Aug 25 22:52 137 - 
 /index/master/solr/data/index/_a.tis
 lr-x-- 1 root root 64 Aug 25 22:52 138 - 
 /index/master/solr/data/index/_a.frq
 lr-x-- 1 root root 64 Aug 25 22:52 139 - 
 /index/master/solr/data/index/_a.prx
 lr-x-- 1 root root 64 Aug 25 22:52 140 - 
 /index/master/solr/data/index/_a.fdt
 lr-x-- 1 root root 64 Aug 25 22:52 141 - 
 /index/master/solr/data/index/_a.fdx
 lr-x-- 1 root root 64 Aug 25 22:52 142 - 
 /index/master/solr/data/index/_a.tvx
 lr-x-- 1 root root 64 Aug 25 22:52 143 - 
 /index/master/solr/data/index/_a.tvd
 lr-x-- 1 root root 64 Aug 25 22:52 144 - 
 /index/master/solr/data/index/_a.tvf
 lr-x-- 1 root root 64 Aug 25 22:52 145 - 
 /index/master/solr/data/index/_a.nrm
 lr-x-- 1 root root 64 Aug 25 22:52 72 - 
 /index/master/solr/data/index/_b.tis
 lr-x-- 1 root root 64 Aug 25 22:52 73 - 
 /index/master/solr/data/index/_b.frq
 lr-x-- 1 root root 64 Aug 25 22:52 74 - 
 /index/master/solr/data/index/_b.prx
 lr-x-- 1 root root 64 Aug 25 22:52 76 - 
 /index/master/solr/data/index/_b.fdt
 lr-x-- 1 root root 64 Aug 25 22:52 78 - 
 /index/master/solr/data/index/_b.fdx
 lr-x-- 1 root root 64 Aug 25 22:52 79 - 
 /index/master/solr/data/index/_b.nrm
 lr-x-- 1 root root 64 Aug 25 22:52 80 - 
 /index/master/solr/data/index/_c.tis
 lr-x-- 1 root root 64 Aug 25 22:52 81 - 
 /index/master/solr/data/index/_c.frq
 lr-x-- 1 root root 64 Aug 25 22:52 82 - 
 

Solr nightly build failure

2009-08-28 Thread solr-dev

init-forrest-entities:
[mkdir] Created dir: /tmp/apache-solr-nightly/build
[mkdir] Created dir: /tmp/apache-solr-nightly/build/web

compile-solrj:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solrj
[javac] Compiling 84 source files to /tmp/apache-solr-nightly/build/solrj
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compile:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/solr
[javac] Compiling 373 source files to /tmp/apache-solr-nightly/build/solr
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

compileTests:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/tests
[javac] Compiling 167 source files to /tmp/apache-solr-nightly/build/tests
[javac] Note: Some input files use or override a deprecated API.
[javac] Note: Recompile with -Xlint:deprecation for details.
[javac] Note: Some input files use unchecked or unsafe operations.
[javac] Note: Recompile with -Xlint:unchecked for details.

junit:
[mkdir] Created dir: /tmp/apache-solr-nightly/build/test-results
[junit] Running org.apache.solr.BasicFunctionalityTest
[junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 27.792 sec
[junit] Running org.apache.solr.ConvertedLegacyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.196 sec
[junit] Running org.apache.solr.DisMaxRequestHandlerTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 11.192 sec
[junit] Running org.apache.solr.EchoParamsTest
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 4.433 sec
[junit] Running org.apache.solr.MinimalSchemaTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 6.59 sec
[junit] Running org.apache.solr.OutputWriterTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 4.107 sec
[junit] Running org.apache.solr.SampleTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 6.926 sec
[junit] Running org.apache.solr.SolrInfoMBeanTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.276 sec
[junit] Running org.apache.solr.TestDistributedSearch
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 61.66 sec
[junit] Running org.apache.solr.TestSolrCoreProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.895 sec
[junit] Running org.apache.solr.TestTrie
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 13.585 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.959 sec
[junit] Running org.apache.solr.analysis.DoubleMetaphoneFilterTest
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.817 sec
[junit] Running org.apache.solr.analysis.EnglishPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.433 sec
[junit] Running org.apache.solr.analysis.HTMLStripCharFilterTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 1.143 sec
[junit] Running org.apache.solr.analysis.LengthFilterTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.824 sec
[junit] Running org.apache.solr.analysis.SnowballPorterFilterFactoryTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.999 sec
[junit] Running org.apache.solr.analysis.TestBufferedTokenStream
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.122 sec
[junit] Running org.apache.solr.analysis.TestCapitalizationFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 1.52 sec
[junit] Running 
org.apache.solr.analysis.TestDelimitedPayloadTokenFilterFactory
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 7.575 sec
[junit] Running org.apache.solr.analysis.TestHyphenatedWordsFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.658 sec
[junit] Running org.apache.solr.analysis.TestKeepFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.589 sec
[junit] Running org.apache.solr.analysis.TestKeepWordFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.876 sec
[junit] Running org.apache.solr.analysis.TestMappingCharFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.655 sec
[junit] Running org.apache.solr.analysis.TestPatternReplaceFilter
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 3.489 sec
[junit] Running 

Build failed in Hudson: Solr-trunk #907

2009-08-28 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/907/changes

Changes:

[noble] SOLR-1391 The XPath field in the XPathEntityResolver should use the 
resolver to replace possible tokens

[gsingers] Add a get started section to the front page

[yonik] AutoCommitTest: no more guessing about when a commit has finished

--
[...truncated 2227 lines...]
[junit] Running org.apache.solr.analysis.TestPatternTokenizerFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.848 sec
[junit] Running org.apache.solr.analysis.TestPhoneticFilter
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 2.388 sec
[junit] Running org.apache.solr.analysis.TestRemoveDuplicatesTokenFilter
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 3.182 sec
[junit] Running org.apache.solr.analysis.TestStopFilterFactory
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.254 sec
[junit] Running org.apache.solr.analysis.TestSynonymFilter
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 9.361 sec
[junit] Running org.apache.solr.analysis.TestSynonymMap
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 4.836 sec
[junit] Running org.apache.solr.analysis.TestTrimFilter
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.489 sec
[junit] Running org.apache.solr.analysis.TestWordDelimiterFilter
[junit] Tests run: 14, Failures: 0, Errors: 0, Time elapsed: 44.253 sec
[junit] Running org.apache.solr.client.solrj.SolrExceptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.955 sec
[junit] Running org.apache.solr.client.solrj.SolrQueryTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.517 sec
[junit] Running org.apache.solr.client.solrj.TestBatchUpdate
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 24.938 sec
[junit] Running org.apache.solr.client.solrj.TestLBHttpSolrServer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 19.087 sec
[junit] Running org.apache.solr.client.solrj.beans.TestDocumentObjectBinder
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.086 sec
[junit] Running org.apache.solr.client.solrj.embedded.JettyWebappTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 11.272 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.759 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.43 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.996 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.491 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 7.907 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.219 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 17.75 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 34.326 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 33.91 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.249 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.526 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.512 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.688 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.86 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.292 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 14.777 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, 

[jira] Created: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Reuben Firmin (JIRA)
NPE on replication page on slave


 Key: SOLR-1392
 URL: https://issues.apache.org/jira/browse/SOLR-1392
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
Reporter: Reuben Firmin


On our slave's replication page, I periodically see this exception. 

java.lang.NullPointerException
at 
_jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
at com.caucho.jsp.Page.pageservice(Page.java:578)
at 
com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
at 
com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
at 
com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
at 
com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
at 
com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
at 
com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
at 
com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
at 
com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
at 
com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
at 
com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
at java.lang.Thread.run(Thread.java:619)

Date: Fri, 28 Aug 2009 13:53:59 GMT
Server: Apache/2.2.3 (Red Hat)
Content-Type: text/html; charset=utf-8
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 524
Connection: close

‹��íVMÓ0=_чìÅi¶ßȪ„
�...@Õn%Ž'ëÇ6ŽÓ-ÿÇqi‹S-àËHã7ïÅ3vžŒSòe„ ùf8,'�...@ÔœgÑ9ĉأÏ6ßhè÷G
Ê0)¢WIJ§J¡Œä%(Ó 8£¤QÆDǬÅmx}`âËõFÿÍýtr礨,%5-$jÀOܐ= 
Ê3-8Ú4íd¹eÉufLfó€ÒTF9«1´pôŒÛ6W±-å²Íâߧ˜.îo)ÃŽ(...ÞyáNÍ.Ðé.f/n´'³é~j8dÚ1Ã]ïõ¾×p™�...@¯kÅÑw©Ÿ‰Îãn´•žÿ•t:ôgõE,Š*ɵÓ-
 ]¢G›\ÏñßÌ߸ñ‹ómë;-„œêlzKºm¯šØ) më›×ÕOõìÞ*Õ)B­æüófe}DmðGŒ
r‹/'5...^à¥Ôm©ZI!€º¯ëZÄ×+íO¢6Œ£m¡­Ím¤ä¯×ʆ¿%Õ- *Þ[á...¾¥'So
g÷lK.ªO•‹»'­ëâÄyp„w2ÿÑ8rÚqoĽ÷FÜñ2bk¹Vƒd‰mc„«'pû:~€Š‰d„R4²U~'Ψ-˽  à?...I��

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Mucking with DocSlices

2009-08-28 Thread Grant Ingersoll
I'd like to be able to change the DocList in a SearchComponent, for  
instance to shorten or lengthen it.  I can get a shorter one via the  
subset() method, but the problem with this is the new subset still  
reflects the number of matches, etc. of the parent, which seems a  
little odd to me.  If I say String.substring().length(), I wouldn't  
expect the length returned to be the same as the parent (unless of  
course the substring requested is the identity one), so I'm not sure  
why DocSlice.subset does.  Likewise for the maxScore, etc.


Is there a reason why, if I know I have a DocSlice, I can't cast the  
docList to it and make some of these lower level changes to the member  
variables?  It would be a lot more efficient than having to copy over  
all the docs, etc. to a new DocSlice.


Thanks,
Grant


Re: [jira] Created: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
By any chance can you share that file _index__jsp.java ?

On Fri, Aug 28, 2009 at 7:32 PM, Reuben Firmin (JIRA)j...@apache.org wrote:
 NPE on replication page on slave
 

                 Key: SOLR-1392
                 URL: https://issues.apache.org/jira/browse/SOLR-1392
             Project: Solr
          Issue Type: Bug
          Components: web gui
    Affects Versions: 1.4
            Reporter: Reuben Firmin


 On our slave's replication page, I periodically see this exception.

 java.lang.NullPointerException
        at 
 _jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
        at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
        at com.caucho.jsp.Page.pageservice(Page.java:578)
        at 
 com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
        at 
 com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
        at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
        at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
        at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
        at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
        at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
        at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
        at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
        at 
 com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
        at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
        at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
        at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
        at java.lang.Thread.run(Thread.java:619)

 Date: Fri, 28 Aug 2009 13:53:59 GMT
 Server: Apache/2.2.3 (Red Hat)
 Content-Type: text/html; charset=utf-8
 Vary: Accept-Encoding,User-Agent
 Content-Encoding: gzip
 Content-Length: 524
 Connection: close

 ‹��íVM Ó0=_чìÅi¶ßȪ„
 �...@Õn%Ž'ë Ç6ŽÓ-ÿÇqi‹S-àËHã7ïÅ3vžŒSòe„ ùf8,'�...@ÔœgÑ9ĉأÏ6ßhè÷G
 Ê0)¢WIJ§J¡Œä%(Ó 8£¤QÆDǬÅmx}`âËõFÿÍýtr礨,%5-$jÀOÜ = Ê3 -8Ú4íd¹eÉufLfó€ÒT 
 F9«1´pôŒÛ6W±-å²Íâߧ˜.îo)ÃŽ(...ÞyáNÍ.Ðé.f/n´'³é~j8DÚ1 à 
 ]ïõ¾×p™�...@¯kÅÑw©Ÿ‰Îãn´•žÿ•t:ôgõE,Š*ɵÓ- ]¢G›\ÏñßÌ߸ñ‹ómë; -„œêlzKºm¯šØ) 
 më›×ÕOõìÞ*Õ)B­æü ófe}DmðGŒ
 r‹/'5...^à¥Ôm©ZI!€º¯ëZÄ×+í O¢6Œ£m¡­Ím¤ä¯×ʆ¿%Õ- *Þ[á...¾¥'So
 g÷lK.ªO•‹»'­ëâÄyp„w2ÿÑ8rÚqoĽ÷FÜñ 2bk¹Vƒd‰mc„«'pû:~€Š‰d„R4²U~'Ψ-˽   à?...I��

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.





-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


[jira] Assigned: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-1392:


Assignee: Noble Paul

 NPE on replication page on slave
 

 Key: SOLR-1392
 URL: https://issues.apache.org/jira/browse/SOLR-1392
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
Reporter: Reuben Firmin
Assignee: Noble Paul

 On our slave's replication page, I periodically see this exception. 
 java.lang.NullPointerException
   at 
 _jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
   at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
   at com.caucho.jsp.Page.pageservice(Page.java:578)
   at 
 com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
   at 
 com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
   at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
   at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
   at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
   at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
   at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
   at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
   at java.lang.Thread.run(Thread.java:619)
 Date: Fri, 28 Aug 2009 13:53:59 GMT
 Server: Apache/2.2.3 (Red Hat)
 Content-Type: text/html; charset=utf-8
 Vary: Accept-Encoding,User-Agent
 Content-Encoding: gzip
 Content-Length: 524
 Connection: close
 ‹��íVMÓ0=_чìÅi¶ßȪ„
 �...@Õn%Ž'ëÇ6ŽÓ-ÿÇqi‹S-àËHã7ïÅ3vžŒSòe„ ùf8,'�...@ÔœgÑ9ĉأÏ6ßhè÷G
 Ê0)¢WIJ§J¡Œä%(Ó 8£¤QÆDǬÅmx}`âËõFÿÍýtr礨,%5-$jÀOܐ= 
 Ê3-8Ú4íd¹eÉufLfó€ÒTF9«1´pôŒÛ6W±-å²Íâߧ˜.îo)ÃŽ(...ÞyáNÍ.Ðé.f/n´'³é~j8dÚ1Ã]ïõ¾×p™�...@¯kÅÑw©Ÿ‰Îãn´•žÿ•t:ôgõE,Š*ɵÓ-
  ]¢G›\ÏñßÌ߸ñ‹ómë;-„œêlzKºm¯šØ) më›×ÕOõìÞ*Õ)B­æüófe}DmðGŒ
 r‹/'5...^à¥Ôm©ZI!€º¯ëZÄ×+íO¢6Œ£m¡­Ím¤ä¯×ʆ¿%Õ- *Þ[á...¾¥'So
 g÷lK.ªO•‹»'­ëâÄyp„w2ÿÑ8rÚqoĽ÷FÜñ2bk¹Vƒd‰mc„«'pû:~€Š‰d„R4²U~'Ψ-˽  à?...I��

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1393) Allow more control over SearchComponents ordering in SearchHandler

2009-08-28 Thread Grant Ingersoll (JIRA)
Allow more control over SearchComponents ordering in SearchHandler
--

 Key: SOLR-1393
 URL: https://issues.apache.org/jira/browse/SOLR-1393
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 1.5


It would be useful to be able to add the notion of before/after when declaring 
search components.  Currently, you can either explicitly declare all components 
or insert at the beginning or end.  It would be nice to be able to say: this 
new component comes after the Query component without having to declare all the 
components.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Mucking with DocSlices

2009-08-28 Thread Yonik Seeley
On Fri, Aug 28, 2009 at 10:26 AM, Grant Ingersollgsing...@apache.org wrote:
  If I say
 String.substring().length(), I wouldn't expect the length returned to be the
 same as the parent (unless of course the substring requested is the identity
 one), so I'm not sure why DocSlice.subset does.

.size() should reflect the new size.
.matches() always reflects the total number of matches that this
DocList is a window into.

 Likewise for the maxScore,
 etc.

 Is there a reason why, if I know I have a DocSlice, I can't cast the docList
 to it and make some of these lower level changes to the member variables?
  It would be a lot more efficient than having to copy over all the docs,
 etc. to a new DocSlice.

Just make a new DocSlice - one shouldn't be modifying these since they
can be cached.

-Yonik
http://www.lucidimagination.com


Re: Mucking with DocSlices

2009-08-28 Thread Grant Ingersoll


On Aug 28, 2009, at 1:03 PM, Yonik Seeley wrote:

On Fri, Aug 28, 2009 at 10:26 AM, Grant  
Ingersollgsing...@apache.org wrote:

 If I say
String.substring().length(), I wouldn't expect the length returned  
to be the
same as the parent (unless of course the substring requested is the  
identity

one), so I'm not sure why DocSlice.subset does.


.size() should reflect the new size.
.matches() always reflects the total number of matches that this
DocList is a window into.


 Likewise for the maxScore,
etc.

Is there a reason why, if I know I have a DocSlice, I can't cast  
the docList
to it and make some of these lower level changes to the member  
variables?
 It would be a lot more efficient than having to copy over all the  
docs,

etc. to a new DocSlice.


Just make a new DocSlice - one shouldn't be modifying these since they
can be cached.


Sure, but that requires creating a new int [] doc array, copying  
elements, etc. all over again and I may not need to do that (for  
instance, if I am shortening the list based on some business rules)


My solution so far is a light weight wrapper around DocList that seems  
to be working just fine.


[jira] Commented: (SOLR-1301) Solr + Hadoop

2009-08-28 Thread jv ning (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748935#action_12748935
 ] 

jv ning commented on SOLR-1301:
---

I have used this at a decent scale, and will be adding a few patches, to allow 
mutliple tasks per machine to build.

The code currently uses the same directory in /tmp for the solr config, and if 
multipel tasks are running, the directory may be removed by earlier tasks that 
finish.

 Solr + Hadoop
 -

 Key: SOLR-1301
 URL: https://issues.apache.org/jira/browse/SOLR-1301
 Project: Solr
  Issue Type: Improvement
Affects Versions: 1.4
Reporter: Andrzej Bialecki 
 Attachments: hadoop-0.19.1-core.jar, hadoop.patch


 This patch contains  a contrib module that provides distributed indexing 
 (using Hadoop) to Solr EmbeddedSolrServer. The idea behind this module is 
 twofold:
 * provide an API that is familiar to Hadoop developers, i.e. that of 
 OutputFormat
 * avoid unnecessary export and (de)serialization of data maintained on HDFS. 
 SolrOutputFormat consumes data produced by reduce tasks directly, without 
 storing it in intermediate files. Furthermore, by using an 
 EmbeddedSolrServer, the indexing task is split into as many parts as there 
 are reducers, and the data to be indexed is not sent over the network.
 Design
 --
 Key/value pairs produced by reduce tasks are passed to SolrOutputFormat, 
 which in turn uses SolrRecordWriter to write this data. SolrRecordWriter 
 instantiates an EmbeddedSolrServer, and it also instantiates an 
 implementation of SolrDocumentConverter, which is responsible for turning 
 Hadoop (key, value) into a SolrInputDocument. This data is then added to a 
 batch, which is periodically submitted to EmbeddedSolrServer. When reduce 
 task completes, and the OutputFormat is closed, SolrRecordWriter calls 
 commit() and optimize() on the EmbeddedSolrServer.
 The API provides facilities to specify an arbitrary existing solr.home 
 directory, from which the conf/ and lib/ files will be taken.
 This process results in the creation of as many partial Solr home directories 
 as there were reduce tasks. The output shards are placed in the output 
 directory on the default filesystem (e.g. HDFS). Such part-N directories 
 can be used to run N shard servers. Additionally, users can specify the 
 number of reduce tasks, in particular 1 reduce task, in which case the output 
 will consist of a single shard.
 An example application is provided that processes large CSV files and uses 
 this API. It uses a custom CSV processing to avoid (de)serialization overhead.
 This patch relies on hadoop-core-0.19.1.jar - I attached the jar to this 
 issue, you should put it in contrib/hadoop/lib.
 Note: the development of this patch was sponsored by an anonymous contributor 
 and approved for release under Apache License.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1255) An attempt to visit the replication admin page when its not a defined handler should display an approp message

2009-08-28 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748939#action_12748939
 ] 

Grant Ingersoll commented on SOLR-1255:
---

Is this fixed?

 An attempt to visit the replication admin page when its not a defined handler 
 should display an approp message
 --

 Key: SOLR-1255
 URL: https://issues.apache.org/jira/browse/SOLR-1255
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Noble Paul
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1255.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1221) Change Solr Highlighting to use the SpanScorer with MultiTerm expansion by default

2009-08-28 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1221?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748940#action_12748940
 ] 

Grant Ingersoll commented on SOLR-1221:
---

Is this going to make it into 1.4?

 Change Solr Highlighting to use the SpanScorer with MultiTerm expansion by 
 default
 --

 Key: SOLR-1221
 URL: https://issues.apache.org/jira/browse/SOLR-1221
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 1.4


 To improve the out of the box experience of Solr 1.4, I really think we 
 should make this change. You will still be able to turn both off.
 Comments?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Lucene RC2

2009-08-28 Thread Grant Ingersoll
Anyone tried out the new Lucene RC2 in Solr yet?  Should we upgrade to  
it?


Re: Lucene RC2

2009-08-28 Thread Ryan McKinley

have not tried it yet but we should certainly upgrade.

the more testing the better!


On Aug 28, 2009, at 2:54 PM, Grant Ingersoll wrote:

Anyone tried out the new Lucene RC2 in Solr yet?  Should we upgrade  
to it?




[jira] Resolved: (SOLR-1091) phps (serialized PHP) writer produces invalid output

2009-08-28 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1091.


Resolution: Fixed

committed.

 phps (serialized PHP) writer produces invalid output
 --

 Key: SOLR-1091
 URL: https://issues.apache.org/jira/browse/SOLR-1091
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
 Environment: Sun JRE 1.6.0 on Centos 5
Reporter: frank farmer
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1091.patch


 The serialized PHP output writer can outputs invalid string lengths for 
 certain (unusual) input values.  Specifically, I had a document containing 
 the following 6 byte character sequence: \xED\xAF\x80\xED\xB1\xB8
 I was able to create a document in the index containing this value without 
 issue; however, when fetching the document back out using the serialized PHP 
 writer, it returns a string like the following:
 s:4:􀁸;
 Note that the string length specified is 4, while the string is actually 6 
 bytes long.
 When using PHP's native serialize() function, it correctly sets the length to 
 6:
 # php -r 'var_dump(serialize(\xED\xAF\x80\xED\xB1\xB8));'
 string(13) s:6:􀁸;
 The wt=php writer, which produces output to be parsed with eval(), doesn't 
 have any trouble with this string.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1383) Replication causes master to fail to delete old index files

2009-08-28 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748964#action_12748964
 ] 

Lance Norskog commented on SOLR-1383:
-

The old files did go away after an optimize. Thank you.

Restarting did not remove them. I suggest that old index files should be 
removed after all runtime requirements for them disappear. They should 
definitely be removed by restarting. Restarting Solr should cure all runtime 
problems; this includes extra files.

There are a lot of Solr sites that want continuous propagation from data source 
to indexing to query. If they use Java replication to poll continuously for 
updates, it will leave vast amounts of junk files behind. 

The current functionality is fine for a Solr 1.4 release, but this issue should 
be fixed after that. Please reopen it and mark it for 1.5.

Thanks.

 Replication causes master to fail to delete old index files
 ---

 Key: SOLR-1383
 URL: https://issues.apache.org/jira/browse/SOLR-1383
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
 Environment: Linux CentOS - latest Solr 1.4 trunk - Java 1.6
Reporter: Lance Norskog
 Fix For: 1.4


 I have developed a way to make replication leave old index files in the 
 master's data/index directory. It is timing-dependent. A sequence of commands 
 runs correctly or fails, depending on the timing between the commands.
 Here is the test scenario:
 Start a master and slave version of the Solr distributed example. I used 8080 
 for the slave. (See example/etc/jetty.xml)
 Be sure to start with empty solr/data/index files on both master and slave.
 Open the replication administration jsp on the slave ( 
 http://localhost:8080/solr/admin/replication/index.jsp )
 Disable polling.
 In a text window, go to the example/exampledocs directory and run this script
 {code}
 for x in *.xml
 do
   echo $x
   sh post.sh $x
   sleep 15
   curl http://localhost:8080/solr/replication?command=fetchindex;
 done
 {code}
 This prints each example file, indexes it, and does a replication command. At 
 the end of this exercise, the master and slave solr/data/index files will be 
 identical.
 Now, kill master  slave, remove the solr/index/data directories, and start 
 over.  This time, remove the sleep command from the script. In my 
 environment, old Lucene index files were left in the master's data/index. 
 Here is what is left in the master data/index. 
  The segments_? files are random across runs, but the index files left over 
 are consistent.
 Note (courtesy of the Linux 'ls -l /proc/PID/fd' command) that the old files 
 are not kept open by the master solr; they are merely left behind.
 In the master server:
 {code}
 % ls solr/data/index
 _0.fdt  _1.prx  _2.tvx  _4.nrm  _5.tii  _7.frq  _8.tvd  _a.tvx  _c.nrm
 _0.fdx  _1.tii  _3.fdt  _4.prx  _5.tis  _7.nrm  _8.tvf  _b.fdt  _c.prx
 _0.fnm  _1.tis  _3.fdx  _4.tii  _6.fdt  _7.prx  _8.tvx  _b.fdx  _c.tii
 _0.frq  _2.fdt  _3.fnm  _4.tis  _6.fdx  _7.tii  _a.fdt  _b.fnm  _c.tis
 _0.nrm  _2.fdx  _3.frq  _4.tvd  _6.fnm  _7.tis  _a.fdx  _b.frq  segments.gen
 _0.prx  _2.fnm  _3.nrm  _4.tvf  _6.frq  _8.fdt  _a.fnm  _b.nrm  segments_8
 _0.tii  _2.frq  _3.prx  _4.tvx  _6.nrm  _8.fdx  _a.frq  _b.prx  segments_9
 _0.tis  _2.nrm  _3.tii  _5.fdt  _6.prx  _8.fnm  _a.nrm  _b.tii  segments_a
 _1.fdt  _2.prx  _3.tis  _5.fdx  _6.tii  _8.frq  _a.prx  _b.tis  segments_b
 _1.fdx  _2.tii  _4.fdt  _5.fnm  _6.tis  _8.nrm  _a.tii  _c.fdt  segments_c
 _1.fnm  _2.tis  _4.fdx  _5.frq  _7.fdt  _8.prx  _a.tis  _c.fdx  segments_d
 _1.frq  _2.tvd  _4.fnm  _5.nrm  _7.fdx  _8.tii  _a.tvd  _c.fnm
 _1.nrm  _2.tvf  _4.frq  _5.prx  _7.fnm  _8.tis  _a.tvf  _c.frq
 {code}
 {code}
 % ls -l /proc/PID/fd
 lr-x-- 1 root root 64 Aug 25 22:52 137 - 
 /index/master/solr/data/index/_a.tis
 lr-x-- 1 root root 64 Aug 25 22:52 138 - 
 /index/master/solr/data/index/_a.frq
 lr-x-- 1 root root 64 Aug 25 22:52 139 - 
 /index/master/solr/data/index/_a.prx
 lr-x-- 1 root root 64 Aug 25 22:52 140 - 
 /index/master/solr/data/index/_a.fdt
 lr-x-- 1 root root 64 Aug 25 22:52 141 - 
 /index/master/solr/data/index/_a.fdx
 lr-x-- 1 root root 64 Aug 25 22:52 142 - 
 /index/master/solr/data/index/_a.tvx
 lr-x-- 1 root root 64 Aug 25 22:52 143 - 
 /index/master/solr/data/index/_a.tvd
 lr-x-- 1 root root 64 Aug 25 22:52 144 - 
 /index/master/solr/data/index/_a.tvf
 lr-x-- 1 root root 64 Aug 25 22:52 145 - 
 /index/master/solr/data/index/_a.nrm
 lr-x-- 1 root root 64 Aug 25 22:52 72 - 
 /index/master/solr/data/index/_b.tis
 lr-x-- 1 root root 64 Aug 25 22:52 73 - 
 /index/master/solr/data/index/_b.frq
 lr-x-- 1 root root 64 Aug 25 22:52 74 - 
 /index/master/solr/data/index/_b.prx
 lr-x-- 1 root root 64 Aug 25 

[jira] Commented: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Reuben Firmin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748985#action_12748985
 ] 

Reuben Firmin commented on SOLR-1392:
-

Further debugging - this happens when the master url cannot be reached (i.e. 
does not resolve to a real URL). 

 NPE on replication page on slave
 

 Key: SOLR-1392
 URL: https://issues.apache.org/jira/browse/SOLR-1392
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
Reporter: Reuben Firmin
Assignee: Noble Paul
 Fix For: 1.4


 On our slave's replication page, I periodically see this exception. 
 java.lang.NullPointerException
   at 
 _jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
   at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
   at com.caucho.jsp.Page.pageservice(Page.java:578)
   at 
 com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
   at 
 com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
   at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
   at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
   at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
   at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
   at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
   at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
   at java.lang.Thread.run(Thread.java:619)
 Date: Fri, 28 Aug 2009 13:53:59 GMT
 Server: Apache/2.2.3 (Red Hat)
 Content-Type: text/html; charset=utf-8
 Vary: Accept-Encoding,User-Agent
 Content-Encoding: gzip
 Content-Length: 524
 Connection: close

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (SOLR-659) Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr

2009-08-28 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley reassigned SOLR-659:
-

Assignee: Yonik Seeley

 Explicitly set start and rows per shard for more efficient bulk queries 
 across distributed Solr
 ---

 Key: SOLR-659
 URL: https://issues.apache.org/jira/browse/SOLR-659
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: Brian Whitman
Assignee: Yonik Seeley
Priority: Minor
 Fix For: 1.4

 Attachments: shards.start_rows.patch, SOLR-659.patch


 The default behavior of setting start and rows on distributed solr (SOLR-303) 
 is to set start at 0 across all shards and set rows to start+rows across each 
 shard. This ensures all results are returned for any arbitrary start and rows 
 setting, but during bulk queries (where start is incrementally increased 
 and rows is kept consistent) the client would need finer control of the 
 per-shard start and rows parameter as retrieving many thousands of documents 
 becomes intractable as start grows higher.
 Attaching a patch that creates a shards.start and shards.rows parameter. If 
 used, the logic that sets rows to start+rows per shard is overridden and each 
 shard gets the exact start and rows set in shards.start and shards.rows. The 
 client will receive up to shards.rows * nShards results and should set rows 
 accordingly. This makes bulk queries across distributed solr possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-659) Explicitly set start and rows per shard for more efficient bulk queries across distributed Solr

2009-08-28 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12748986#action_12748986
 ] 

Yonik Seeley commented on SOLR-659:
---

I agree this makes sense to enable efficient bulk operations, and also fits in 
with a past idea I had about mapping shards.param=foo to param=foo during a 
sub-request.

I'll give it a couple of days and commit if there are no objections.

 Explicitly set start and rows per shard for more efficient bulk queries 
 across distributed Solr
 ---

 Key: SOLR-659
 URL: https://issues.apache.org/jira/browse/SOLR-659
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.3
Reporter: Brian Whitman
Priority: Minor
 Fix For: 1.4

 Attachments: shards.start_rows.patch, SOLR-659.patch


 The default behavior of setting start and rows on distributed solr (SOLR-303) 
 is to set start at 0 across all shards and set rows to start+rows across each 
 shard. This ensures all results are returned for any arbitrary start and rows 
 setting, but during bulk queries (where start is incrementally increased 
 and rows is kept consistent) the client would need finer control of the 
 per-shard start and rows parameter as retrieving many thousands of documents 
 becomes intractable as start grows higher.
 Attaching a patch that creates a shards.start and shards.rows parameter. If 
 used, the logic that sets rows to start+rows per shard is overridden and each 
 shard gets the exact start and rows set in shards.start and shards.rows. The 
 client will receive up to shards.rows * nShards results and should set rows 
 accordingly. This makes bulk queries across distributed solr possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: HTML decoder is splitting tokens

2009-08-28 Thread Anders Melchiorsen
Greetings.

I am moving this issue from the solr-user list. As can be seen in the
messages below, I am having problems with the Solr HTML stripper.

After some investigation, I have found the cause to be that the
stripper is replacing the removed HTML with spaces. This obviously
breaks when the HTML is in the middle of a word, like Guuml;nther.

So, without knowing what I was doing, I hacked together a fix that
uses offset correction instead.

That seemed to work, except that closing tags and attributes still
broke the positioning. With even less of a clue, I replaced read()
with next() in the two methods handling those.

Finally, invalid HTML also gave wrong offsets, and I fixed that by
restoring numRead when rolling back the input stream.

At this point I stopped trying to break it, so there may still be more
problems. Or I might have introduced some problem on my own. Anyway, I
have put the three patches at the bottom of this mail, in case
somebody wants to move along with this issue.



Regards,
Anders.



Anders Melchiorsen m...@spoon.kalibalik.dk writes:

 Hello.

 Thanks for the hints. Still some trouble, though.

 I added just the HTMLStripCharFilterFactory because, according to
 documentation, it should also replace HTML entities. It did, but
 still left a space after the entity, so I got two tokens from
 Guuml;nther. That seems like a bug?

 Adding MappingCharFilterFactory in front of the HTML stripper (so
 that the latter will not see the entity) does work as expected. That
 is, until I try strings like use lt;pgt; to mark a paragraph,
 where the HTML stripper will then remove parts of the actual text.
 So this approach will not work.


 Finally, I was happy that I could now use an arbitrary tokenizer
 with HTML input. The PatternTokenizer, however, seems to be using
 character offsets corresponding to the output of the char filters,
 and so the highlighting markers end up at the wrong place. Is that a
 bug, or a configuration issue?


 Cheers,
 Anders.


 Koji Sekiguchi wrote:
 Hi Anders,

 Sorry, I don't know this is a bug or a feature, but
 I'd like to show an alternate way if you'd like.

 In Solr trunk, HTMLStripWhitespaceTokenizerFactory is
 marked as deprecated. Instead, HTMLStripCharFilterFactory and
 an arbitrary TokenizerFactory are encouraged to use.
 And I'd recommend you to use MappingCharFilterFactory
 to convert character references to real characters.
 That is, you have:

 fieldType name=textHtml class=solr.TextField 
   analyzer
 charFilter class=solr.MappingCharFilterFactory
 mapping=mapping.txt/
 charFilter class=solr.HTMLStripCharFilterFactory/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType

 where the contents of mapping.txt:

 uuml; = ü
 auml; = ä
 iuml; = ï
 euml; = ë
 ouml; = ö
 : :

 Then run analysis.jsp and see the result.

 Thank you,

 Koji


 Anders Melchiorsen wrote:
 Hi.

 When indexing the string Guuml;nther with
 HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two
 tokens, Gü and nther.

 Is this a bug, or am I doing something wrong?

 (Using a Solr nightly from 2009-05-29)


 Anders.



commit 1fb2d42181d8effb1b444aa2fa02d86df1d860d7
Author: Anders Melchiorsen m...@spoon.kalibalik.dk
Date:   Fri Aug 28 15:57:03 2009 +0200

Use offset correction instead of inserting spaces into the stream

Fixes Guuml;nther turning into Gü nther.

diff --git a/HTMLStripCharFilter.java b/HTMLStripCharFilter.java
index 733d783..e473cef 100644
--- a/HTMLStripCharFilter.java
+++ b/HTMLStripCharFilter.java
@@ -37,7 +37,9 @@ public class HTMLStripCharFilter extends BaseCharFilter {
   private int readAheadLimit = DEFAULT_READ_AHEAD;
   private int safeReadAheadLimit = readAheadLimit - 3;
   private int numWhitespace = 0;
+  private int numWhitespaceCorrected = 0;
   private int numRead = 0;
+  private int numReadLast = 0;
   private int lastMark;
   private SetString escapedTags;
 
@@ -674,9 +676,11 @@ public class HTMLStripCharFilter extends BaseCharFilter {
 // where do we have to worry about them?
 // ![ CDATA [ unescaped markup ]]
 if (numWhitespace  0){
-  numWhitespace--;
-  return ' ';
+  addOffCorrectMap(numReadLast+1-numWhitespaceCorrected, 
numWhitespaceCorrected+numWhitespace);
+  numWhitespaceCorrected += numWhitespace;
+  numWhitespace = 0;
 }
+numReadLast = numRead;
 //do not limit this one by the READAHEAD
 while(true) {
   int lastNumRead = numRead;

commit 542f5734136bbfd72ae802c30b6c61361268bccf
Author: Anders Melchiorsen m...@spoon.kalibalik.dk
Date:   Fri Aug 28 15:57:29 2009 +0200

Use next() in place of read()

The read() method is our public interface, while next()
is what we use internally to get the next character.

diff --git a/HTMLStripCharFilter.java b/HTMLStripCharFilter.java
index e473cef..ab14de5 100644
--- a/HTMLStripCharFilter.java
+++ b/HTMLStripCharFilter.java
@@ -537,13 +537,13 @@ public class 

[jira] Updated: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Reuben Firmin (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reuben Firmin updated SOLR-1392:


Comment: was deleted

(was: Further debugging - this happens when the master url cannot be reached 
(i.e. does not resolve to a real URL). )

 NPE on replication page on slave
 

 Key: SOLR-1392
 URL: https://issues.apache.org/jira/browse/SOLR-1392
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
Reporter: Reuben Firmin
Assignee: Noble Paul
 Fix For: 1.4


 On our slave's replication page, I periodically see this exception. 
 java.lang.NullPointerException
   at 
 _jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
   at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
   at com.caucho.jsp.Page.pageservice(Page.java:578)
   at 
 com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
   at 
 com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
   at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
   at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
   at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
   at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
   at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
   at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
   at java.lang.Thread.run(Thread.java:619)
 Date: Fri, 28 Aug 2009 13:53:59 GMT
 Server: Apache/2.2.3 (Red Hat)
 Content-Type: text/html; charset=utf-8
 Vary: Accept-Encoding,User-Agent
 Content-Encoding: gzip
 Content-Length: 524
 Connection: close

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Reuben Firmin (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749008#action_12749008
 ] 

Reuben Firmin commented on SOLR-1392:
-

There's some issue on the master. What does host mean in this context?

http://master/replication?command=detailswt=xml

java.lang.IllegalArgumentException: host parameter is null
at 
org.apache.commons.httpclient.HttpConnection.init(HttpConnection.java:206)
at 
org.apache.commons.httpclient.HttpConnection.init(HttpConnection.java:155)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionWithReference.init(MultiThreadedHttpConnectionManager.java:1145)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.createConnection(MultiThreadedHttpConnectionManager.java:762)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:476)
at 
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
at 
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at 
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at 
org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java:192)
at 
org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java:187)
at 
org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:589)
at 
org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:180)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
at 
com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
at 
com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
at 
com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
at 
com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
at java.lang.Thread.run(Thread.java:619)

Date: Fri, 28 Aug 2009 22:22:53 GMT
Server: Apache/2.2.3 (Red Hat)
Cache-Control: no-cache, no-store
Pragma: no-cache
Expires: Sat, 01 Jan 2000 01:00:00 GMT
Content-Type: text/html; charset=UTF-8
Vary: Accept-Encoding,User-Agent
Content-Encoding: gzip
Content-Length: 713
Connection: close



 NPE on replication page on slave
 

 Key: SOLR-1392
 URL: https://issues.apache.org/jira/browse/SOLR-1392
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
Reporter: Reuben Firmin
Assignee: Noble Paul
 Fix For: 1.4


 On our slave's replication page, I periodically see this exception. 
 java.lang.NullPointerException
   at 
 _jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
   at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
   at com.caucho.jsp.Page.pageservice(Page.java:578)
   at 
 com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
   at 
 com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
   at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
   at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
   at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
   at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
   at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
   at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
   at 
 com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
   at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
  

[jira] Commented: (SOLR-1343) HTMLStripCharFilter

2009-08-28 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749021#action_12749021
 ] 

Jason Rutherglen commented on SOLR-1343:


I'm seeing a bug related to this patch going in. It's been hard
to track down and I'm dealing with a JVM bug at the same time,
so I haven't had time to write a test case yet. 

In summary, I reverted to the previous classes and the indexing
goes back to normal.

 HTMLStripCharFilter
 ---

 Key: SOLR-1343
 URL: https://issues.apache.org/jira/browse/SOLR-1343
 Project: Solr
  Issue Type: Improvement
  Components: Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1343.patch


 Introducing HTMLStripCharFilter:
 * move html strip logic from HTMLStripReader to HTMLStripCharFilter
 * make HTMLStripReader depracated
 * make HTMLStrip*TokenizerFactory deprecated

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: HTML decoder is splitting tokens

2009-08-28 Thread Koji Sekiguchi

Anders,

Thank you for attaching the patch. Sorry again, I don't have
enough time to investigate the patch and the problem you have,
though, I'd like just to recommend that you'd open a JIRA issue
and attach the patch so that I or someone can look into it later.

And I didn't understand this part of your previous mail:

 Adding MappingCharFilterFactory in front of the HTML stripper (so
 that the latter will not see the entity) does work as expected. That
 is, until I try strings like use lt;pgt; to mark a paragraph,
 where the HTML stripper will then remove parts of the actual text.
 So this approach will not work.

Thanks,

Koji

Anders Melchiorsen wrote:

Greetings.

I am moving this issue from the solr-user list. As can be seen in the
messages below, I am having problems with the Solr HTML stripper.

After some investigation, I have found the cause to be that the
stripper is replacing the removed HTML with spaces. This obviously
breaks when the HTML is in the middle of a word, like Guuml;nther.

So, without knowing what I was doing, I hacked together a fix that
uses offset correction instead.

That seemed to work, except that closing tags and attributes still
broke the positioning. With even less of a clue, I replaced read()
with next() in the two methods handling those.

Finally, invalid HTML also gave wrong offsets, and I fixed that by
restoring numRead when rolling back the input stream.

At this point I stopped trying to break it, so there may still be more
problems. Or I might have introduced some problem on my own. Anyway, I
have put the three patches at the bottom of this mail, in case
somebody wants to move along with this issue.



Regards,
Anders.



Anders Melchiorsen m...@spoon.kalibalik.dk writes:

  

Hello.

Thanks for the hints. Still some trouble, though.

I added just the HTMLStripCharFilterFactory because, according to
documentation, it should also replace HTML entities. It did, but
still left a space after the entity, so I got two tokens from
Guuml;nther. That seems like a bug?

Adding MappingCharFilterFactory in front of the HTML stripper (so
that the latter will not see the entity) does work as expected. That
is, until I try strings like use lt;pgt; to mark a paragraph,
where the HTML stripper will then remove parts of the actual text.
So this approach will not work.


Finally, I was happy that I could now use an arbitrary tokenizer
with HTML input. The PatternTokenizer, however, seems to be using
character offsets corresponding to the output of the char filters,
and so the highlighting markers end up at the wrong place. Is that a
bug, or a configuration issue?


Cheers,
Anders.


Koji Sekiguchi wrote:


Hi Anders,

Sorry, I don't know this is a bug or a feature, but
I'd like to show an alternate way if you'd like.

In Solr trunk, HTMLStripWhitespaceTokenizerFactory is
marked as deprecated. Instead, HTMLStripCharFilterFactory and
an arbitrary TokenizerFactory are encouraged to use.
And I'd recommend you to use MappingCharFilterFactory
to convert character references to real characters.
That is, you have:

fieldType name=textHtml class=solr.TextField 
  analyzer
charFilter class=solr.MappingCharFilterFactory
mapping=mapping.txt/
charFilter class=solr.HTMLStripCharFilterFactory/
tokenizer class=solr.WhitespaceTokenizerFactory/
  /analyzer
/fieldType

where the contents of mapping.txt:

uuml; = ü
auml; = ä
iuml; = ï
euml; = ë
ouml; = ö
: :

Then run analysis.jsp and see the result.

Thank you,

Koji


Anders Melchiorsen wrote:
  

Hi.

When indexing the string Guuml;nther with
HTMLStripWhitespaceTokenizerFactory (in analysis.jsp), I get two
tokens, Gü and nther.

Is this a bug, or am I doing something wrong?

(Using a Solr nightly from 2009-05-29)


Anders.





commit 1fb2d42181d8effb1b444aa2fa02d86df1d860d7
Author: Anders Melchiorsen m...@spoon.kalibalik.dk
Date:   Fri Aug 28 15:57:03 2009 +0200

Use offset correction instead of inserting spaces into the stream

Fixes Guuml;nther turning into Gü nther.


diff --git a/HTMLStripCharFilter.java b/HTMLStripCharFilter.java
index 733d783..e473cef 100644
--- a/HTMLStripCharFilter.java
+++ b/HTMLStripCharFilter.java
@@ -37,7 +37,9 @@ public class HTMLStripCharFilter extends BaseCharFilter {
   private int readAheadLimit = DEFAULT_READ_AHEAD;
   private int safeReadAheadLimit = readAheadLimit - 3;
   private int numWhitespace = 0;
+  private int numWhitespaceCorrected = 0;
   private int numRead = 0;
+  private int numReadLast = 0;
   private int lastMark;
   private SetString escapedTags;
 
@@ -674,9 +676,11 @@ public class HTMLStripCharFilter extends BaseCharFilter {

 // where do we have to worry about them?
 // ![ CDATA [ unescaped markup ]]
 if (numWhitespace  0){
-  numWhitespace--;
-  return ' ';
+  addOffCorrectMap(numReadLast+1-numWhitespaceCorrected, 
numWhitespaceCorrected+numWhitespace);
+  numWhitespaceCorrected += numWhitespace;
+  

Re: [jira] Commented: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
does it work if you hit the url http://master/replication directly?
On Sat, Aug 29, 2009 at 3:56 AM, Reuben Firmin (JIRA)j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/SOLR-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749008#action_12749008
  ]

 Reuben Firmin commented on SOLR-1392:
 -

 There's some issue on the master. What does host mean in this context?

 http://master/replication?command=detailswt=xml

 java.lang.IllegalArgumentException: host parameter is null
        at 
 org.apache.commons.httpclient.HttpConnection.init(HttpConnection.java:206)
        at 
 org.apache.commons.httpclient.HttpConnection.init(HttpConnection.java:155)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionWithReference.init(MultiThreadedHttpConnectionManager.java:1145)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.createConnection(MultiThreadedHttpConnectionManager.java:762)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:476)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
        at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
        at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at 
 org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java:192)
        at 
 org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java:187)
        at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:589)
        at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:180)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
        at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
        at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
        at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
        at 
 com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
        at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
        at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
        at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
        at java.lang.Thread.run(Thread.java:619)

 Date: Fri, 28 Aug 2009 22:22:53 GMT
 Server: Apache/2.2.3 (Red Hat)
 Cache-Control: no-cache, no-store
 Pragma: no-cache
 Expires: Sat, 01 Jan 2000 01:00:00 GMT
 Content-Type: text/html; charset=UTF-8
 Vary: Accept-Encoding,User-Agent
 Content-Encoding: gzip
 Content-Length: 713
 Connection: close



 NPE on replication page on slave
 

                 Key: SOLR-1392
                 URL: https://issues.apache.org/jira/browse/SOLR-1392
             Project: Solr
          Issue Type: Bug
          Components: web gui
    Affects Versions: 1.4
            Reporter: Reuben Firmin
            Assignee: Noble Paul
             Fix For: 1.4


 On our slave's replication page, I periodically see this exception.
 java.lang.NullPointerException
       at 
 _jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
       at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
       at com.caucho.jsp.Page.pageservice(Page.java:578)
       at 
 com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
       at 
 com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
       at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
       at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
       at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
       at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
       at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
       at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
       at 
 

Re: [jira] Commented: (SOLR-1392) NPE on replication page on slave

2009-08-28 Thread Noble Paul നോബിള്‍ नोब्ळ्
BTW which build are you using?

2009/8/29 Noble Paul നോബിള്‍  नोब्ळ् noble.p...@corp.aol.com:
 does it work if you hit the url http://master/replication directly?
 On Sat, Aug 29, 2009 at 3:56 AM, Reuben Firmin (JIRA)j...@apache.org wrote:

    [ 
 https://issues.apache.org/jira/browse/SOLR-1392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749008#action_12749008
  ]

 Reuben Firmin commented on SOLR-1392:
 -

 There's some issue on the master. What does host mean in this context?

 http://master/replication?command=detailswt=xml

 java.lang.IllegalArgumentException: host parameter is null
        at 
 org.apache.commons.httpclient.HttpConnection.init(HttpConnection.java:206)
        at 
 org.apache.commons.httpclient.HttpConnection.init(HttpConnection.java:155)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionWithReference.init(MultiThreadedHttpConnectionManager.java:1145)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ConnectionPool.createConnection(MultiThreadedHttpConnectionManager.java:762)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.doGetConnection(MultiThreadedHttpConnectionManager.java:476)
        at 
 org.apache.commons.httpclient.MultiThreadedHttpConnectionManager.getConnectionWithTimeout(MultiThreadedHttpConnectionManager.java:416)
        at 
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:153)
        at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at 
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at 
 org.apache.solr.handler.SnapPuller.getNamedListResponse(SnapPuller.java:192)
        at 
 org.apache.solr.handler.SnapPuller.getCommandResponse(SnapPuller.java:187)
        at 
 org.apache.solr.handler.ReplicationHandler.getReplicationDetails(ReplicationHandler.java:589)
        at 
 org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:180)
        at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
        at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
        at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
        at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
        at 
 com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:178)
        at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
        at 
 com.caucho.server.hmux.HmuxRequest.handleRequest(HmuxRequest.java:435)
        at com.caucho.server.port.TcpConnection.run(TcpConnection.java:586)
        at com.caucho.util.ThreadPool$Item.runTasks(ThreadPool.java:690)
        at com.caucho.util.ThreadPool$Item.run(ThreadPool.java:612)
        at java.lang.Thread.run(Thread.java:619)

 Date: Fri, 28 Aug 2009 22:22:53 GMT
 Server: Apache/2.2.3 (Red Hat)
 Cache-Control: no-cache, no-store
 Pragma: no-cache
 Expires: Sat, 01 Jan 2000 01:00:00 GMT
 Content-Type: text/html; charset=UTF-8
 Vary: Accept-Encoding,User-Agent
 Content-Encoding: gzip
 Content-Length: 713
 Connection: close



 NPE on replication page on slave
 

                 Key: SOLR-1392
                 URL: https://issues.apache.org/jira/browse/SOLR-1392
             Project: Solr
          Issue Type: Bug
          Components: web gui
    Affects Versions: 1.4
            Reporter: Reuben Firmin
            Assignee: Noble Paul
             Fix For: 1.4


 On our slave's replication page, I periodically see this exception.
 java.lang.NullPointerException
       at 
 _jsp._admin._replication._index__jsp._jspService(_index__jsp.java:265)
       at com.caucho.jsp.JavaPage.service(JavaPage.java:61)
       at com.caucho.jsp.Page.pageservice(Page.java:578)
       at 
 com.caucho.server.dispatch.PageFilterChain.doFilter(PageFilterChain.java:192)
       at 
 com.caucho.server.webapp.DispatchFilterChain.doFilter(DispatchFilterChain.java:97)
       at 
 com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:241)
       at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:280)
       at 
 com.caucho.server.webapp.RequestDispatcherImpl.forward(RequestDispatcherImpl.java:108)
       at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:264)
       at 
 com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:76)
       at 
 com.caucho.server.cache.CacheFilterChain.doFilter(CacheFilterChain.java:158)
       at 
 

[jira] Commented: (SOLR-1383) Replication causes master to fail to delete old index files

2009-08-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749076#action_12749076
 ] 

Noble Paul commented on SOLR-1383:
--

Lance. let me suggest you one thing.

# disable replicationhandler
# run your program
# check the file list in the index
# repeat the same set of operations with replictaion on. and see if there is 
any difference in the no:of files

 Replication causes master to fail to delete old index files
 ---

 Key: SOLR-1383
 URL: https://issues.apache.org/jira/browse/SOLR-1383
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
 Environment: Linux CentOS - latest Solr 1.4 trunk - Java 1.6
Reporter: Lance Norskog
 Fix For: 1.4


 I have developed a way to make replication leave old index files in the 
 master's data/index directory. It is timing-dependent. A sequence of commands 
 runs correctly or fails, depending on the timing between the commands.
 Here is the test scenario:
 Start a master and slave version of the Solr distributed example. I used 8080 
 for the slave. (See example/etc/jetty.xml)
 Be sure to start with empty solr/data/index files on both master and slave.
 Open the replication administration jsp on the slave ( 
 http://localhost:8080/solr/admin/replication/index.jsp )
 Disable polling.
 In a text window, go to the example/exampledocs directory and run this script
 {code}
 for x in *.xml
 do
   echo $x
   sh post.sh $x
   sleep 15
   curl http://localhost:8080/solr/replication?command=fetchindex;
 done
 {code}
 This prints each example file, indexes it, and does a replication command. At 
 the end of this exercise, the master and slave solr/data/index files will be 
 identical.
 Now, kill master  slave, remove the solr/index/data directories, and start 
 over.  This time, remove the sleep command from the script. In my 
 environment, old Lucene index files were left in the master's data/index. 
 Here is what is left in the master data/index. 
  The segments_? files are random across runs, but the index files left over 
 are consistent.
 Note (courtesy of the Linux 'ls -l /proc/PID/fd' command) that the old files 
 are not kept open by the master solr; they are merely left behind.
 In the master server:
 {code}
 % ls solr/data/index
 _0.fdt  _1.prx  _2.tvx  _4.nrm  _5.tii  _7.frq  _8.tvd  _a.tvx  _c.nrm
 _0.fdx  _1.tii  _3.fdt  _4.prx  _5.tis  _7.nrm  _8.tvf  _b.fdt  _c.prx
 _0.fnm  _1.tis  _3.fdx  _4.tii  _6.fdt  _7.prx  _8.tvx  _b.fdx  _c.tii
 _0.frq  _2.fdt  _3.fnm  _4.tis  _6.fdx  _7.tii  _a.fdt  _b.fnm  _c.tis
 _0.nrm  _2.fdx  _3.frq  _4.tvd  _6.fnm  _7.tis  _a.fdx  _b.frq  segments.gen
 _0.prx  _2.fnm  _3.nrm  _4.tvf  _6.frq  _8.fdt  _a.fnm  _b.nrm  segments_8
 _0.tii  _2.frq  _3.prx  _4.tvx  _6.nrm  _8.fdx  _a.frq  _b.prx  segments_9
 _0.tis  _2.nrm  _3.tii  _5.fdt  _6.prx  _8.fnm  _a.nrm  _b.tii  segments_a
 _1.fdt  _2.prx  _3.tis  _5.fdx  _6.tii  _8.frq  _a.prx  _b.tis  segments_b
 _1.fdx  _2.tii  _4.fdt  _5.fnm  _6.tis  _8.nrm  _a.tii  _c.fdt  segments_c
 _1.fnm  _2.tis  _4.fdx  _5.frq  _7.fdt  _8.prx  _a.tis  _c.fdx  segments_d
 _1.frq  _2.tvd  _4.fnm  _5.nrm  _7.fdx  _8.tii  _a.tvd  _c.fnm
 _1.nrm  _2.tvf  _4.frq  _5.prx  _7.fnm  _8.tis  _a.tvf  _c.frq
 {code}
 {code}
 % ls -l /proc/PID/fd
 lr-x-- 1 root root 64 Aug 25 22:52 137 - 
 /index/master/solr/data/index/_a.tis
 lr-x-- 1 root root 64 Aug 25 22:52 138 - 
 /index/master/solr/data/index/_a.frq
 lr-x-- 1 root root 64 Aug 25 22:52 139 - 
 /index/master/solr/data/index/_a.prx
 lr-x-- 1 root root 64 Aug 25 22:52 140 - 
 /index/master/solr/data/index/_a.fdt
 lr-x-- 1 root root 64 Aug 25 22:52 141 - 
 /index/master/solr/data/index/_a.fdx
 lr-x-- 1 root root 64 Aug 25 22:52 142 - 
 /index/master/solr/data/index/_a.tvx
 lr-x-- 1 root root 64 Aug 25 22:52 143 - 
 /index/master/solr/data/index/_a.tvd
 lr-x-- 1 root root 64 Aug 25 22:52 144 - 
 /index/master/solr/data/index/_a.tvf
 lr-x-- 1 root root 64 Aug 25 22:52 145 - 
 /index/master/solr/data/index/_a.nrm
 lr-x-- 1 root root 64 Aug 25 22:52 72 - 
 /index/master/solr/data/index/_b.tis
 lr-x-- 1 root root 64 Aug 25 22:52 73 - 
 /index/master/solr/data/index/_b.frq
 lr-x-- 1 root root 64 Aug 25 22:52 74 - 
 /index/master/solr/data/index/_b.prx
 lr-x-- 1 root root 64 Aug 25 22:52 76 - 
 /index/master/solr/data/index/_b.fdt
 lr-x-- 1 root root 64 Aug 25 22:52 78 - 
 /index/master/solr/data/index/_b.fdx
 lr-x-- 1 root root 64 Aug 25 22:52 79 - 
 /index/master/solr/data/index/_b.nrm
 lr-x-- 1 root root 64 Aug 25 22:52 80 - 
 /index/master/solr/data/index/_c.tis
 lr-x-- 1 root root 64 Aug 25 22:52 81 - 
 /index/master/solr/data/index/_c.frq
 lr-x-- 1 root root 64 Aug 25 22:52 82 - 
 

[jira] Commented: (SOLR-1255) An attempt to visit the replication admin page when its not a defined handler should display an approp message

2009-08-28 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12749077#action_12749077
 ] 

Noble Paul commented on SOLR-1255:
--

the original issue is fixed. But if the user registers multiple RH then only 
one will be shown. Ideally there should be no multiple RH registered

 An attempt to visit the replication admin page when its not a defined handler 
 should display an approp message
 --

 Key: SOLR-1255
 URL: https://issues.apache.org/jira/browse/SOLR-1255
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Noble Paul
Priority: Trivial
 Fix For: 1.4

 Attachments: SOLR-1255.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.