RE: [Lucene.Net] Merging 3.0.3 into Trunk

2012-03-01 Thread Prescott Nasser
Jira and then just submit your own patch imo

Sent from my Windows Phone

From: Stefan Bodewig
Sent: 3/1/2012 7:23 AM
To: lucene-net-...@incubator.apache.org
Subject: Re: [Lucene.Net] Merging 3.0.3 into Trunk

On 2012-02-29, Stefan Bodewig wrote:

 On 2012-02-28, Christopher Currens wrote:

 Alright, it's done!  3.0.3 is now merged in with Trunk!

 I'll see to running RAT and looking at the line-ends over the next few
 days so we can get them fixed once and not run into it with the release.

I went for EOLs first and there are 621 files outside of lib and doc
that need to be fixed.  What I have now is not just a patch (of more
than 200k lines), but also a list of 621 files that need their
svn:eol-style property to be set.

I can create a JIRA ticket for that attaching my patch and the list of
files to fix or - since I technically am a committer - could just commit
my cleaned up workspace as is (plus JIRA ticket that I'd open and close
myself).

What would you prefer?

RAT doesn't really make sense before the line feeds are correct (I've
seen quite a few files without license headers by manual inspection).

Stefan


Re: [Lucene.Net] FW: trouble getting cms content to work correctly

2012-03-01 Thread Michael Herndon
Is it safe to add content to the site through the CMS again?  Anything to
be wary of or not explicitly do?

On Wed, Feb 15, 2012 at 6:30 PM, Prescott Nasser geobmx...@hotmail.comwrote:


 Took all day, but Joe was there babysitting and correcting things for us.
 Basically there is a bug in svn 1.6.17 that the CMS is based on, which is
 making our commits a pain at the moment. Once that gets upgraded it should
 be relatively smooth sailing. It won't help us though if we want still
 planning on updating massive amounts of documentation on a regular basis.
 Thanks Joe, I can't thank you enough for the help today.
  ~PrescottDate: Wed, 15 Feb 2012 14:49:48 -0800
 From: joe_schae...@yahoo.com
 Subject: Re: trouble getting cms content to work correctly
 To: geobmx...@hotmail.com

 After some testing it appears that this performancebug is fixed in svn
 1.7, but the CMS is currentlyrunning 1.6.17.  I hope to have the host
 upgradedwithin the next 30 days or so, but for now I stillrecommend using
 the script.

From: Prescott Nasser geobmx...@hotmail.com
  To: joe_schae...@yahoo.com
  Sent: Wednesday, February 15, 2012 5:28 PM
  Subject: RE: trouble getting cms content to work correctly






 Alright - sounds good

 Thanks again!

 ~P

 Date: Wed, 15 Feb 2012 14:25:45 -0800
 From: joe_schae...@yahoo.com
 Subject: Re: trouble getting cms content to work correctly
 To: geobmx...@hotmail.com

 I'm having some svn people look at the merge issues.Right now all I can
 suggest is that you publish usingthe publish.pl script on
 people.apache.org.  It's takingme about 10 min total to carry that out,
 which is certainlytoo long given the nature of the changes it's merging,but
 I'll let you know what I find
  out.

From: Prescott Nasser geobmx...@hotmail.com
  To: joe_schae...@yahoo.com
  Sent: Wednesday, February 15, 2012 5:13 PM
  Subject: RE: trouble getting cms content to work correctly






 It's butt ugly - all in one directory, 8206 files. I'd prefer a more
 natural docs structure, but that's how it gets generated

 ~P

 Date: Wed, 15 Feb 2012 14:10:47 -0800
 From: joe_schae...@yahoo.com
 Subject: Re: trouble getting cms content to work correctly
 To: geobmx...@hotmail.com

 Ok lemee kill it and use the publish.pl scripton people to see if I can
 get it to work right.Just curious tho- about how many files do youhave
 within that docs dir- all in one dir I presume?
From: Prescott Nasser geobmx...@hotmail.com
  To: joe_schae...@yahoo.com
  Sent: Wednesday, February 15, 2012
  5:08 PM
  Subject: RE: trouble getting cms content to work correctly






 I'm thinking still merge funk

 Date: Wed, 15 Feb 2012 14:05:21 -0800
 From: joe_schae...@yahoo.com
 Subject: Re: trouble getting cms content to work correctly
 To: geobmx...@hotmail.com

 Looks like it just completed.  Hmm, goahead and publish and lets try this
 onemore time.

From: Joe Schaefer
  joe_schae...@yahoo.com
  To: Prescott Nasser geobmx...@hotmail.com
  Sent: Wednesday, February 15, 2012 5:02 PM
  Subject: Re: trouble getting cms content to work correctly


 Yeah more merge funk. Leave it run for now,but don't take any further
 action until youhear from me.

From: Prescott Nasser geobmx...@hotmail.com
  To: joe_schae...@yahoo.com
  Sent: Wednesday, February
  15, 2012 4:59 PM
  Subject: RE: trouble getting cms content to work correctly






 I hate to be the bearer of bad news... still taking days to publish (I'm
 not sure if there is a merge error or not) let me know I'll kill this quick

 Date: Wed, 15 Feb 2012 13:54:52 -0800
 From: joe_schae...@yahoo.com
 Subject: Re: trouble getting cms content to work correctly
 To: geobmx...@hotmail.com

 Yeah try out the webgui and edit/commit/publisha minor change.  It should
 take you no more thana minute or so total.

From: Prescott Nasser geobmx...@hotmail.com
  To: joe_schae...@yahoo.com
  Sent: Wednesday, February 15, 2012 4:52 PM
  Subject: RE: trouble getting cms content to work correctly






 Man that sounds like a tool full of awesome!

 Ok - so for the moment no new docs, a simple edit should be quick?
 Date: Wed, 15 Feb 2012 13:48:40 -0800
 From: joe_schae...@yahoo.com
 Subject: Re: trouble getting cms content to work correctly
 To: geobmx...@hotmail.com

 Ok it's now fixed and your site should work as expectedat this point.  I
 had to redact the lazy_publish featureand reserve it for admins only
 because you don't actuallyhave permission to completely remove your
 publication
  sitefrom svn and
  that's not something I can offer
  without
  providingyou and every other
  committer with the ability to nukeeach other's entire sites.


 From: Joe Schaefer joe_schae...@yahoo.com
  To: Prescott Nasser geobmx...@hotmail.com
  Sent: Wednesday, February 15, 2012 4:20 PM
  Subject: Re: trouble getting cms content to work correctly


 Yeah well it's probably timing out somewhere along the line.I'm looking
 into your svn tree now to see if I can figure outwhat's 

RE: [Lucene.Net] FW: trouble getting cms content to work correctly

2012-03-01 Thread Prescott Nasser

I'm told it should only take about 10 minutes to publish now (as opposed to 
~2.5 hours before). No harm in trying ;) ~P
  Date: Thu, 1 Mar 2012 21:47:49 -0500
 From: mhern...@wickedsoftware.net
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] FW: trouble getting cms content to work correctly
 
 Is it safe to add content to the site through the CMS again?  Anything to
 be wary of or not explicitly do?
 
 On Wed, Feb 15, 2012 at 6:30 PM, Prescott Nasser geobmx...@hotmail.comwrote:
 
 
  Took all day, but Joe was there babysitting and correcting things for us.
  Basically there is a bug in svn 1.6.17 that the CMS is based on, which is
  making our commits a pain at the moment. Once that gets upgraded it should
  be relatively smooth sailing. It won't help us though if we want still
  planning on updating massive amounts of documentation on a regular basis.
  Thanks Joe, I can't thank you enough for the help today.
   ~PrescottDate: Wed, 15 Feb 2012 14:49:48 -0800
  From: joe_schae...@yahoo.com
  Subject: Re: trouble getting cms content to work correctly
  To: geobmx...@hotmail.com
 
  After some testing it appears that this performancebug is fixed in svn
  1.7, but the CMS is currentlyrunning 1.6.17.  I hope to have the host
  upgradedwithin the next 30 days or so, but for now I stillrecommend using
  the script.
 
 From: Prescott Nasser geobmx...@hotmail.com
   To: joe_schae...@yahoo.com
   Sent: Wednesday, February 15, 2012 5:28 PM
   Subject: RE: trouble getting cms content to work correctly
 
 
 
 
 
 
  Alright - sounds good
 
  Thanks again!
 
  ~P
 
  Date: Wed, 15 Feb 2012 14:25:45 -0800
  From: joe_schae...@yahoo.com
  Subject: Re: trouble getting cms content to work correctly
  To: geobmx...@hotmail.com
 
  I'm having some svn people look at the merge issues.Right now all I can
  suggest is that you publish usingthe publish.pl script on
  people.apache.org.  It's takingme about 10 min total to carry that out,
  which is certainlytoo long given the nature of the changes it's merging,but
  I'll let you know what I find
   out.
 
 From: Prescott Nasser geobmx...@hotmail.com
   To: joe_schae...@yahoo.com
   Sent: Wednesday, February 15, 2012 5:13 PM
   Subject: RE: trouble getting cms content to work correctly
 
 
 
 
 
 
  It's butt ugly - all in one directory, 8206 files. I'd prefer a more
  natural docs structure, but that's how it gets generated
 
  ~P
 
  Date: Wed, 15 Feb 2012 14:10:47 -0800
  From: joe_schae...@yahoo.com
  Subject: Re: trouble getting cms content to work correctly
  To: geobmx...@hotmail.com
 
  Ok lemee kill it and use the publish.pl scripton people to see if I can
  get it to work right.Just curious tho- about how many files do youhave
  within that docs dir- all in one dir I presume?
 From: Prescott Nasser geobmx...@hotmail.com
   To: joe_schae...@yahoo.com
   Sent: Wednesday, February 15, 2012
   5:08 PM
   Subject: RE: trouble getting cms content to work correctly
 
 
 
 
 
 
  I'm thinking still merge funk
 
  Date: Wed, 15 Feb 2012 14:05:21 -0800
  From: joe_schae...@yahoo.com
  Subject: Re: trouble getting cms content to work correctly
  To: geobmx...@hotmail.com
 
  Looks like it just completed.  Hmm, goahead and publish and lets try this
  onemore time.
 
 From: Joe Schaefer
   joe_schae...@yahoo.com
   To: Prescott Nasser geobmx...@hotmail.com
   Sent: Wednesday, February 15, 2012 5:02 PM
   Subject: Re: trouble getting cms content to work correctly
 
 
  Yeah more merge funk. Leave it run for now,but don't take any further
  action until youhear from me.
 
 From: Prescott Nasser geobmx...@hotmail.com
   To: joe_schae...@yahoo.com
   Sent: Wednesday, February
   15, 2012 4:59 PM
   Subject: RE: trouble getting cms content to work correctly
 
 
 
 
 
 
  I hate to be the bearer of bad news... still taking days to publish (I'm
  not sure if there is a merge error or not) let me know I'll kill this quick
 
  Date: Wed, 15 Feb 2012 13:54:52 -0800
  From: joe_schae...@yahoo.com
  Subject: Re: trouble getting cms content to work correctly
  To: geobmx...@hotmail.com
 
  Yeah try out the webgui and edit/commit/publisha minor change.  It should
  take you no more thana minute or so total.
 
 From: Prescott Nasser geobmx...@hotmail.com
   To: joe_schae...@yahoo.com
   Sent: Wednesday, February 15, 2012 4:52 PM
   Subject: RE: trouble getting cms content to work correctly
 
 
 
 
 
 
  Man that sounds like a tool full of awesome!
 
  Ok - so for the moment no new docs, a simple edit should be quick?
  Date: Wed, 15 Feb 2012 13:48:40 -0800
  From: joe_schae...@yahoo.com
  Subject: Re: trouble getting cms content to work correctly
  To: geobmx...@hotmail.com
 
  Ok it's now fixed and your site should work as expectedat this point.  I
  had to redact the lazy_publish featureand reserve it for admins only
  because you don't actuallyhave permission to completely remove your
  publication
   sitefrom svn and
   that's 

[Lucene.Net] [jira] [Resolved] (LUCENENET-473) Fix linefeeds in more than 600 files

2012-03-01 Thread Stefan Bodewig (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Bodewig resolved LUCENENET-473.
--

Resolution: Fixed

fixed with svn revision 1296052

 Fix linefeeds in more than 600 files
 

 Key: LUCENENET-473
 URL: https://issues.apache.org/jira/browse/LUCENENET-473
 Project: Lucene.Net
  Issue Type: Bug
Affects Versions: Lucene.Net 3.0.3
Reporter: Stefan Bodewig
Assignee: Stefan Bodewig
 Fix For: Lucene.Net 3.0.3


 There are more than 600 files which need the svn:eol-style property set to 
 native and a few that should rather be LF or CRLF.  Many files contain 
 inconsistent line-ends.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




RE: [Lucene.Net] Merging 3.0.3 into Trunk

2012-03-01 Thread Prescott Nasser

Thanks Stefan!
  From: bode...@apache.org
 To: lucene-net-...@incubator.apache.org
 Date: Fri, 2 Mar 2012 06:23:01 +0100
 Subject: Re: [Lucene.Net] Merging 3.0.3 into Trunk
 
 On 2012-03-01, Christopher Currens wrote:
 
  I agree with Prescott.  Make a patch for that sucker! :)
 
 Done
 
 Stefan
  

RE: [Lucene.Net] Official stance on API changes between major versions

2012-03-01 Thread Prescott Nasser

Andy, I appreciate your coppers.  Everyone is really quiet, It seems you want 
us to move forward, I want us to move forward, Chris is actually holding back 
work becuase of this - lets go for the breaking changes. ~P
  Date: Thu, 1 Mar 2012 23:11:47 +
 From: andy.p...@gmail.com
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Official stance on API changes between major 
 versions
 
 I am not a commiter but my company makes extensive use of Lucene.net. So
 here's my two pennies...
 
 I understand that there is a commendable motivation to be gentle with api
 changes. Wanting to give plenty of warning by obsoleting methods.
 
 Several points. First is that there is a change to the major version
 number. Users should expect changes to the api.
 Next, when this project was restarted last year the stated direction was to
 get caught up with the Java version and also to move towards a more dotnet
 style interface.
 
 The discussions on the list do occasionally get bogged down in this kind of
 too and fro. A coach my sports team used once said something along these
 lines... If the team can't choose then no one has made a convincing
 argument. So make a choice, any choice and just get on with it. If it turns
 out to be the wrong choice then at least you've learnt something.
 This is software. It's changeable.
 
 My bias is that I want what's in V4 (codecs, NRT etc). I'm willing to take
 some pain if it means this project can accelerate.
 I would imagine that most serious uses of Lucene would be hidden within a
 service or at least isolated in some way, not dotted around all over the
 application. This is what isolation is for, to protect components from
 change. The impact of even fairly major api changes should be quite
 localised and refactorable. Intimidating, yes. More than a bit scary, of
 course. But worth it for getting the newer bits.
 By all means be professional, make proposals, have some discussion. But
 please let's not be too conservative, too timid.
 
 2.9.4g is a good release. We've been using it since shortly after it seemed
 stable. If there are users that need some stability then they should be
 advised to stick with g for a while.
 
 Now that that is done and a hearty thank you for the work on both the code
 and the Apache process. My vote would be for some more radical changes to
 be allowed. Lets get through 3.0.3 and on to 3.5 and 4.0. Lets get to one
 of the original goals which is functional parity with Java and lets be bold
 with some of the dotnet modifications (note that being bold does not mean
 that one is reckless).
 
 
 I'm sure that some will say, yeah great sentiment, now send some patches. I
 agree. I have sent some very minor patches previously and it frustrates me
 that my company has not contributed more. We have just taken on a lot more
 people so I hope that we can be more active with Lucene.net soon.
 
 --Andy
 
 On 28 February 2012 18:17, Christopher Currens currens.ch...@gmail.comwrote:
 
  I *really* don't mean to be a bother to anyone, but I'd like to continue
  work on this.  I feel that until I can get a better sense of how the group
  feels about this, I can't make much progress.  Perhaps this radio silence
  is just because this email thread got lost in among the others.
 
  On Fri, Feb 24, 2012 at 6:50 PM, Prescott Nasser geobmx...@hotmail.com
  wrote:
 
   Im not against breaking compatibility when changing the version number to
   a new major 2 - 3. Im not sure how others feel. Matching Java access
   modifiers seems like the right move.
  
   That said, what if we mark obsolete in 3.0.3 and when we make the jump to
   4.0 wipe them out? In my head we shouldn't spend too much time cleaning
  up
   3.0.3 aside from bug fixes if were just going to swap it for 4.0 in the
   near future.
  
   There has to be a break at some point, making it with a major release is
   the best place to make it.
  
   Sent from my Windows Phone
   
   From: Christopher Currens
   Sent: 2/24/2012 2:45 PM
   To: lucene-net-dev@lucene.apache.org
   Subject: [Lucene.Net] Official stance on API changes between major
  versions
  
   A bit of background about what I've been doing lately on the project.
Because we've now confirmed that the .NET 3.0.3 branch is a completed
  port
   of Java 3.0.3 version, I've been spending time trying to work on some of
   the bugs and improvements that are assigned to this version.  There
  wasn't
   any real discussion about the actual features, I just created some (based
   on mailing list discussions) and assigned them to the 3.0.3 release.  The
   improvements I've been working on lately are ones that have bugged me
   specifically since I've started using Lucene.NET.
  
   I've worked on https://issues.apache.org/jira/browse/LUCENENET-468 and
   https://issues.apache.org/jira/browse/LUCENENET-470 so far.
  
   LUCENENET-740 is pretty much completed, all of the classes that
  implemented
   

[jira] [Updated] (SOLR-3162) Continue work on new admin UI

2012-03-01 Thread Stefan Matheis (steffkes) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3162:


Attachment: SOLR-3162.patch

Updated Patch, contains:
* edismax Options on Query-Tab
* Check if System-Information on Dashboard is available
* Fixed Param-Handling on Dataimport
* Autoload™ Functionality on Schema-Browser
* Dummy Debug-Option on Cloud-Tab

 Continue work on new admin UI
 -

 Key: SOLR-3162
 URL: https://issues.apache.org/jira/browse/SOLR-3162
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, 
 SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch


 There have been more improvements to how the new UI works, but the current 
 open bugs are getting hard to keep straight. This is the new catch-all JIRA 
 for continued improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-01 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219906#comment-13219906
 ] 

Tommaso Teofili commented on SOLR-3013:
---

thanks Steven, now fixing

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2608) TestReplicationHandler is flakey

2012-03-01 Thread Sami Siren (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219909#comment-13219909
 ] 

Sami Siren commented on SOLR-2608:
--

I am also seeing this test fail quite often. The stacktrace is now different:

{code}
23987 T1101 oasc.SolrException.log SEVERE SnapPull failed 
:org.apache.solr.common.SolrException
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1388)
at 
org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:505)
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:348)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:298)
at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:163)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
at 
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:180)
at 
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:204)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.util.concurrent.RejectedExecutionException
at 
java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:1768)
at 
java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767)
at 
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658)
at 
java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:92)
at 
java.util.concurrent.Executors$DelegatedExecutorService.submit(Executors.java:603)
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1356)
... 13 more

25051 T3 oasu.ConcurrentLRUCache.finalize SEVERE ConcurrentLRUCache was not 
destroyed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!!
40748 T1120 oasc.SolrException.log SEVERE SnapPull failed 
:org.apache.solr.common.SolrException: Unable to download _7_1.del completely. 
Downloaded 0!=92
at 
org.apache.solr.handler.SnapPuller$FileFetcher.cleanup(SnapPuller.java:1081)
at 
org.apache.solr.handler.SnapPuller$FileFetcher.fetchFile(SnapPuller.java:961)
at 
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:587)
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:322)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:298)
at 
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:179)
{code}

 TestReplicationHandler is flakey
 

 Key: SOLR-2608
 URL: https://issues.apache.org/jira/browse/SOLR-2608
 Project: Solr
  Issue Type: Bug
Reporter: selckin

 I've been running some while(1) tests on trunk, and TestReplicationHandler is 
 very flakey it fails about every 10th run.
 Probably not a bug, but the test not waiting correctly
 {code}
 [junit] Testsuite: org.apache.solr.handler.TestReplicationHandler
 [junit] Testcase: org.apache.solr.handler.TestReplicationHandler:   FAILED
 [junit] ERROR: SolrIndexSearcher opens=48 closes=47
 [junit] junit.framework.AssertionFailedError: ERROR: SolrIndexSearcher 
 opens=48 closes=47
 [junit] at 
 org.apache.solr.SolrTestCaseJ4.endTrackingSearchers(SolrTestCaseJ4.java:131)
 [junit] at 
 org.apache.solr.SolrTestCaseJ4.afterClassSolrTestCase(SolrTestCaseJ4.java:74)
 [junit] 
 [junit] 
 [junit] Tests run: 8, Failures: 1, Errors: 0, Time elapsed: 40.772 sec
 [junit] 
 [junit] - Standard Error -
 [junit] 19-Jun-2011 21:26:44 org.apache.solr.handler.SnapPuller 
 fetchLatestIndex
 [junit] SEVERE: Master at: http://localhost:51817/solr/replication is not 
 available. Index fetch failed. Exception: Connection refused
 [junit] 19-Jun-2011 21:26:49 org.apache.solr.common.SolrException log
 [junit] SEVERE: java.util.concurrent.RejectedExecutionException
 [junit] at 
 

[jira] [Commented] (SOLR-3162) Continue work on new admin UI

2012-03-01 Thread Stefan Matheis (steffkes) (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219911#comment-13219911
 ] 

Stefan Matheis (steffkes) commented on SOLR-3162:
-

Sami, yepp also noticed that .. already fixed after taking the screenshot :)

Erick, Are the double Quotes still there? The Patch should remove all 
{{replaceAll}} usages, so the raw content should be visible right know. I'm not 
completely sure which sources are used for the cloud-tab and the 
{{/admin/file}} Handler, so it maybe give you different output ;o

Cloud-Tree Expander still not working? Even w/ the latest Patch? Just to be 
sure, cleared the Browser-Cache?

 Continue work on new admin UI
 -

 Key: SOLR-3162
 URL: https://issues.apache.org/jira/browse/SOLR-3162
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, 
 SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch


 There have been more improvements to how the new UI works, but the current 
 open bugs are getting hard to keep straight. This is the new catch-all JIRA 
 for continued improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-3162) Continue work on new admin UI

2012-03-01 Thread Stefan Matheis (steffkes) (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219911#comment-13219911
 ] 

Stefan Matheis (steffkes) edited comment on SOLR-3162 at 3/1/12 8:34 AM:
-

Sami, yepp also noticed that .. already fixed after taking the screenshot :)

Erick, Are the double Quotes still there? The Patch should remove all 
{{replaceAll}} usages, so the raw content should be visible right know. I'm not 
completely sure which sources are used for the cloud-tab and the 
{{/admin/file}} Handler, so it maybe give you different output ;o

Cloud-Tree Expanding still not working? Even w/ the latest Patch? Just to be 
sure, cleared the Browser-Cache?

  was (Author: steffkes):
Sami, yepp also noticed that .. already fixed after taking the screenshot :)

Erick, Are the double Quotes still there? The Patch should remove all 
{{replaceAll}} usages, so the raw content should be visible right know. I'm not 
completely sure which sources are used for the cloud-tab and the 
{{/admin/file}} Handler, so it maybe give you different output ;o

Cloud-Tree Expander still not working? Even w/ the latest Patch? Just to be 
sure, cleared the Browser-Cache?
  
 Continue work on new admin UI
 -

 Key: SOLR-3162
 URL: https://issues.apache.org/jira/browse/SOLR-3162
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-3162-index.png, SOLR-3162-schema-browser.png, 
 SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch, SOLR-3162.patch


 There have been more improvements to how the new UI works, but the current 
 open bugs are getting hard to keep straight. This is the new catch-all JIRA 
 for continued improvements.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Stefan Matheis

2012-03-01 Thread Chris Male
Congrats, Welcome.

On Thu, Mar 1, 2012 at 10:04 AM, Ryan McKinley ryan...@gmail.com wrote:

 I'm pleased to announce that Stefan Matheis has joined our ranks as a
 committer.

 He has given the solr admin UI some much needed love.  It now looks
 like it belongs in 2012!

 Stefan, it is tradition that you introduce yourself with a brief bio.

 Your SVN access should be ready to go.

 Welcome!

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl


[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1845 - Failure

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1845/

1 tests failed.
REGRESSION:  org.apache.solr.cloud.RecoveryZkTest.testDistribSearch

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:788)
at 
org.apache.lucene.util.LuceneTestCase.access$1100(LuceneTestCase.java:138)
at 
org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:612)
at 
org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:511)
at 
org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:573)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:816)
at 
org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:760)




Build Log (for compile errors):
[...truncated 9846 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2012-03-01 Thread Olivier Dutrieux (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219939#comment-13219939
 ] 

Olivier Dutrieux commented on SOLR-2438:


I try it yesterday the 3.6-SNAPSHOT and I remark something strange :

||raw query||parsed query||comment||
|name:LéCTROD\*|name:lectrod\*|fill good|
|name:\*LéCTROD|name:lectrod|{color:red} that remove the wildcard !!!{color} |
|name:\*LéCTROD\*|name:lectrod|{color:red} that remove all wildcards !!!{color} 
|

I would like to know if it's normal that if the wildcard is on the first 
position on the raw query, the wildcard is remove on the parsed query ?

{code:title=schema.xml|borderStyle=solid}
types
fieldtype name=text_fr class=solr.TextField
analyzer type=index
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=query
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
analyzer type=multiterm
tokenizer class=solr.StandardTokenizerFactory /
filter class=solr.StandardFilterFactory/
filter class=solr.ASCIIFoldingFilterFactory/
filter class=solr.LowerCaseFilterFactory/
/analyzer
/fieldtype
/types

fields
field name=name type=text_fr indexed=true stored=true 
multiValued=true/
/fields
{code}


Duto



 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
 Fix For: 3.6, 4.0

 Attachments: SOLR-2438-3x.patch, SOLR-2438-3x.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438_3x.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug

2012-03-01 Thread Wenca Petr (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219940#comment-13219940
 ] 

Wenca Petr commented on SOLR-3011:
--

Hi, I've just applied this patch and it solved my problem with multithreaded 
indexing from sql using berkeley backed cache, which was opened x times (for 
each thread) bud closed only by one thread, so it remained opened. After the 
path, the cache is opened only once and properly closed but each thread seems 
to index all documents. If I have 5000 documents and 4 threads then full import 
say: Added/Updated: 2 documents.

 DIH MultiThreaded bug
 -

 Key: SOLR-3011
 URL: https://issues.apache.org/jira/browse/SOLR-3011
 Project: Solr
  Issue Type: Sub-task
  Components: contrib - DataImportHandler
Affects Versions: 3.5, 4.0
Reporter: Mikhail Khludnev
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3011.patch, SOLR-3011.patch


 current DIH design is not thread safe. see last comments at SOLR-2382 and 
 SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly 
 it's a SOLR-2947 patch from 28th Dec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



ConjunctionScorer.doNext() overstays?

2012-03-01 Thread mark harwood
Due to the odd behaviour of a custom Scorer of mine I discovered 
ConjunctionScorer.doNext() could loop indefinitely.
It does not bail out as soon as any scorer.advance() call it makes reports back 
NO_MORE_DOCS. Is there not a performance optimisation to be gained in exiting 
as soon as this happens?
At this stage I cannot see any point in continuing to advance other scorers - 
a quick look at TermScorer suggests that any questionable calls made by 
ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment and 
disk will be hit as a consequence.
I added an extra condition to the while loop on the 3.5 source:

    while ((doc != NO_MORE_DOCS)   ((firstScorer = scorers[first]).docID()  
doc)) {
    
and Junit tests passed.I haven't been able to benchmark performance 
improvements but it looks like it would be sensible to make the change anyway.

Cheers,
Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12569 - Still Failing

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12569/

All tests passed

Build Log (for compile errors):
[...truncated 14641 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Solr-trunk - Build # 1779 - Failure

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Solr-trunk/1779/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:788)
at 
org.apache.lucene.util.LuceneTestCase.access$1100(LuceneTestCase.java:138)
at 
org.apache.lucene.util.LuceneTestCase$InternalSetupTeardownRule$1.evaluate(LuceneTestCase.java:612)
at 
org.apache.lucene.util.LuceneTestCase$TestResultInterceptorRule$1.evaluate(LuceneTestCase.java:511)
at 
org.apache.lucene.util.LuceneTestCase$RememberThreadRule$1.evaluate(LuceneTestCase.java:573)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:165)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:57)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:816)
at 
org.apache.lucene.util.LuceneTestCase.tearDownInternal(LuceneTestCase.java:760)




Build Log (for compile errors):
[...truncated 10431 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: ThreadPool threads leaking to suite scope.

2012-03-01 Thread Dawid Weiss
 1) initialize threads eagerly; use ThreadPoolExecutor and call
 prestartAllCoreThreads. this could be applied to LTC on the trunk.

I did this but threads still leak out from unclosed readers created by
LTC#newSearcher. I don't know why, but this isn't called --

   r.addReaderClosedListener(new ReaderClosedListener() {
 @Override
 public void onClose(IndexReader reader) {
   shutdownExecutorService(ex);
 }
   });

Clues?

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12570 - Still Failing

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12570/

All tests passed

Build Log (for compile errors):
[...truncated 14657 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: ThreadPool threads leaking to suite scope.

2012-03-01 Thread Uwe Schindler
I think the problem in newSearcher ist hat sometimes the reader is wrapped. If 
its wrapped, the underlying reader is only closed, not the wrapper. But the 
listener is added to the wrapper. We should add the listener to the original 
inner reader.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Dawid Weiss [mailto:dawid.we...@gmail.com]
 Sent: Thursday, March 01, 2012 11:51 AM
 To: dev@lucene.apache.org
 Subject: Re: ThreadPool threads leaking to suite scope.
 
  1) initialize threads eagerly; use ThreadPoolExecutor and call
  prestartAllCoreThreads. this could be applied to LTC on the trunk.
 
 I did this but threads still leak out from unclosed readers created by
 LTC#newSearcher. I don't know why, but this isn't called --
 
r.addReaderClosedListener(new ReaderClosedListener() {
  @Override
  public void onClose(IndexReader reader) {
shutdownExecutorService(ex);
  }
});
 
 Clues?
 
 Dawid
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3177) Excluding tagged filter in StatsComponent

2012-03-01 Thread CP (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219955#comment-13219955
 ] 

CP commented on SOLR-3177:
--

This feature is also necessary while using multi-select range facets with 
facet.range to get min and max of a field to set facet.range.start and 
facet.range.end.

 Excluding tagged filter in StatsComponent
 -

 Key: SOLR-3177
 URL: https://issues.apache.org/jira/browse/SOLR-3177
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 3.5, 3.6
Reporter: Mark Schoy
Priority: Minor
  Labels: localparams, stats, statscomponent

 It would be useful to exclude the effects of some fq params from the set of 
 documents used to compute stats -- similar to 
 how you can exclude tagged filters when generating facet counts... 
 https://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
 So that it's possible to do something like this... 
 http://localhost:8983/solr/select?fq={!tag=priceFilter}price:[1 TO 
 20]q=*:*stats=truestats.field={!ex=priceFilter}price 
 If you want to create a price slider this is very useful because then you can 
 filter the price ([1 TO 20) and nevertheless get the lower and upper bound of 
 the unfiltered price (min=0, max=100):
 {noformat}
 |-[---]--|
 $0 $1 $20$100
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2703) Add support for the Lucene Surround Parser

2012-03-01 Thread Shalu Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219958#comment-13219958
 ] 

Shalu Singh commented on SOLR-2703:
---

Hi Ahmet, i am trying to include the SOLR-2703.patch into SOLR 3.5 downloaded 
from SVN branches to provide Surround parser. But it is not working after 
including the 2703 SOLR patch. Do u know how to apply the same??

 Add support for the Lucene Surround Parser
 --

 Key: SOLR-2703
 URL: https://issues.apache.org/jira/browse/SOLR-2703
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 4.0
Reporter: Simon Rosenthal
Assignee: Erik Hatcher
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2703.patch, SOLR-2703.patch, SOLR-2703.patch


 The Lucene/contrib surround parser provides support for span queries. This 
 issue adds a Solr plugin for this parser

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ThreadPool threads leaking to suite scope.

2012-03-01 Thread Dawid Weiss
I don't know how to fix it, Uwe, but I know there's definitely
something not all right with it because threads just keep accumulating
(as new searchers are created).

I've pushed a static seed for which this is repeatable; this is a
heavily worked-on branch but it may lead you to how to fix this:

git clone git://github.com/dweiss/lucene_solr.git
git checkout 935e1e9e9a350d6b35b23c4545caf78e82b42747

try to run TestPhraseQuery (you'll need -ea in Eclipse).

Dawid

On Thu, Mar 1, 2012 at 11:55 AM, Uwe Schindler u...@thetaphi.de wrote:
 I think the problem in newSearcher ist hat sometimes the reader is wrapped. 
 If its wrapped, the underlying reader is only closed, not the wrapper. But 
 the listener is added to the wrapper. We should add the listener to the 
 original inner reader.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Dawid Weiss [mailto:dawid.we...@gmail.com]
 Sent: Thursday, March 01, 2012 11:51 AM
 To: dev@lucene.apache.org
 Subject: Re: ThreadPool threads leaking to suite scope.

  1) initialize threads eagerly; use ThreadPoolExecutor and call
  prestartAllCoreThreads. this could be applied to LTC on the trunk.

 I did this but threads still leak out from unclosed readers created by
 LTC#newSearcher. I don't know why, but this isn't called --

        r.addReaderClosedListener(new ReaderClosedListener() {
          @Override
          public void onClose(IndexReader reader) {
            shutdownExecutorService(ex);
          }
        });

 Clues?

 Dawid

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug

2012-03-01 Thread Mikhail Khludnev (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219961#comment-13219961
 ] 

Mikhail Khludnev commented on SOLR-3011:


Petr,

Your feedback is quite appreciated. 
How much your full indexing time is reduced after multythreading is enabled?
Pls be informed that you are under risk of SOLR-2804.

 DIH MultiThreaded bug
 -

 Key: SOLR-3011
 URL: https://issues.apache.org/jira/browse/SOLR-3011
 Project: Solr
  Issue Type: Sub-task
  Components: contrib - DataImportHandler
Affects Versions: 3.5, 4.0
Reporter: Mikhail Khludnev
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3011.patch, SOLR-3011.patch


 current DIH design is not thread safe. see last comments at SOLR-2382 and 
 SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly 
 it's a SOLR-2947 patch from 28th Dec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3013) Add UIMA based tokenizers / filters that can be used in the schema.xml

2012-03-01 Thread Tommaso Teofili (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219962#comment-13219962
 ] 

Tommaso Teofili commented on SOLR-3013:
---

it should be ok now.

 Add UIMA based tokenizers / filters that can be used in the schema.xml
 --

 Key: SOLR-3013
 URL: https://issues.apache.org/jira/browse/SOLR-3013
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.5
Reporter: Tommaso Teofili
Assignee: Tommaso Teofili
Priority: Minor
  Labels: uima, update_request_handler
 Fix For: 3.6, 4.0

 Attachments: SOLR-3013.patch


 Add UIMA based tokenizers / filters that can be declared and used directly 
 inside the schema.xml.
 Thus instead of using the UIMA UpdateRequestProcessor one could directly 
 define per-field NLP capable tokenizers / filters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: ThreadPool threads leaking to suite scope.

2012-03-01 Thread Uwe Schindler
Hi,

Yeah it's strange. I checked the code: It either does wrap and runs 
single-thread searches or it does *not* wrap and runs several threads. So 
theoretically it should work correctly... We have to check, if all IndexReaders 
are correctly closed and the listeners are called.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: dawid.we...@gmail.com [mailto:dawid.we...@gmail.com] On Behalf Of
 Dawid Weiss
 Sent: Thursday, March 01, 2012 12:02 PM
 To: dev@lucene.apache.org
 Subject: Re: ThreadPool threads leaking to suite scope.
 
 I don't know how to fix it, Uwe, but I know there's definitely something not 
 all
 right with it because threads just keep accumulating (as new searchers are
 created).
 
 I've pushed a static seed for which this is repeatable; this is a heavily 
 worked-on
 branch but it may lead you to how to fix this:
 
 git clone git://github.com/dweiss/lucene_solr.git
 git checkout 935e1e9e9a350d6b35b23c4545caf78e82b42747
 
 try to run TestPhraseQuery (you'll need -ea in Eclipse).
 
 Dawid
 
 On Thu, Mar 1, 2012 at 11:55 AM, Uwe Schindler u...@thetaphi.de wrote:
  I think the problem in newSearcher ist hat sometimes the reader is wrapped.
 If its wrapped, the underlying reader is only closed, not the wrapper. But the
 listener is added to the wrapper. We should add the listener to the original
 inner reader.
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Dawid Weiss [mailto:dawid.we...@gmail.com]
  Sent: Thursday, March 01, 2012 11:51 AM
  To: dev@lucene.apache.org
  Subject: Re: ThreadPool threads leaking to suite scope.
 
   1) initialize threads eagerly; use ThreadPoolExecutor and call
   prestartAllCoreThreads. this could be applied to LTC on the trunk.
 
  I did this but threads still leak out from unclosed readers created
  by LTC#newSearcher. I don't know why, but this isn't called --
 
 r.addReaderClosedListener(new ReaderClosedListener() {
   @Override
   public void onClose(IndexReader reader) {
 shutdownExecutorService(ex);
   }
 });
 
  Clues?
 
  Dawid
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still Failing

2012-03-01 Thread Uwe Schindler
Hi Tommaso:

Can you check the javadocs-warnings? We have now 15 of them and this fails the 
Jenkins builds...:
https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/console

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
 Sent: Thursday, March 01, 2012 8:59 AM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still 
 Failing
 
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/
 
 All tests passed
 
 Build Log (for compile errors):
 [...truncated 14777 lines...]
 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.

2012-03-01 Thread Doron Cohen (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen reassigned LUCENE-3821:
---

Assignee: Doron Cohen

 SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds.
 ---

 Key: LUCENE-3821
 URL: https://issues.apache.org/jira/browse/LUCENE-3821
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.5, 4.0
Reporter: Naomi Dushay
Assignee: Doron Cohen
 Attachments: LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml


 The general bug is a case where a phrase with no slop is found,
 but if you add slop its not.
 I committed a test today (TestSloppyPhraseQuery2) that actually triggers this 
 case,
 jenkins just hasn't had enough time to chew on it.
 ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make 
 it fail on trunk or 3.x

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still Failing

2012-03-01 Thread Tommaso Teofili
Hi Uwe,
I just checked in the fix for that, should be ok now.
Tommaso

2012/3/1 Uwe Schindler u...@thetaphi.de

 Hi Tommaso:

 Can you check the javadocs-warnings? We have now 15 of them and this fails
 the Jenkins builds...:
 https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/console

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
  Sent: Thursday, March 01, 2012 8:59 AM
  To: dev@lucene.apache.org
  Subject: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 12568 - Still
 Failing
 
  Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12568/
 
  All tests passed
 
  Build Log (for compile errors):
  [...truncated 14777 lines...]
 



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




[jira] [Created] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Andrzej Bialecki (Created) (JIRA)
Most Codec.*Format().*Reader() methods should use SegmentReadState
--

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0


Codec formats API for opening readers is inconsistent - sometimes it uses 
SegmentReadState, in other cases it uses individual arguments that are already 
available via SegmentReadState. This complicates extending the API, e.g. if 
additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1846/

All tests passed

Build Log (for compile errors):
[...truncated 14441 lines...]

check-misc-uptodate:

jar-misc:

check-spatial-uptodate:

jar-spatial:

check-grouping-uptodate:

jar-grouping:

check-queries-uptodate:

jar-queries:

check-queryparser-uptodate:

jar-queryparser:

prep-lucene-jars:

common.init:

compile-lucene-core:

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[javac] Compiling 1 source file to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/core/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

compile-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java
[javac] Compiling 8 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21:
 error: package org.apache.lucene.analysis.uima does not exist
[javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21:
 error: package org.apache.lucene.analysis.uima does not exist
[javac] import 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26:
 error: package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProvider;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27:
 error: package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51:
 error: cannot find symbol
[javac]   private AEProvider aeProvider;
[javac]   ^
[javac]   symbol:   class AEProvider
[javac]   location: class UIMAUpdateRequestProcessor
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44:
 error: cannot find symbol
[javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, 
input);
[javac]^
[javac]   symbol:   class UIMAAnnotationsTokenizer
[javac]   location: class UIMAAnnotationsTokenizerFactory
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46:
 error: cannot find symbol
[javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, 
tokenType, featurePath, input);
[javac]^
[javac]   symbol:   class UIMATypeAwareAnnotationsTokenizer
[javac]   location: class UIMATypeAwareAnnotationsTokenizerFactory
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64:
 error: cannot find symbol
[javac] aeProvider = 
AEProviderFactory.getInstance().getAEProvider(solrCore.getName(),
[javac]  ^
[javac]   symbol:   variable AEProviderFactory
[javac]   location: class UIMAUpdateRequestProcessor
[javac] 8 errors
[...truncated 15 lines...]



-
To 

Re: ConjunctionScorer.doNext() overstays?

2012-03-01 Thread mark harwood
I got round to some benchmarking of this change on Wikipedia content which 
shows a small improvement:   http://goo.gl/60wJG

Aside from the small performance gain to be had, it just feels more logical if 
ConjunctionScorer does not issue sub scorers with a request to advance to 
NO_MORE_DOCS.




- Original Message -
From: mark harwood markharw...@yahoo.co.uk
To: dev@lucene.apache.org dev@lucene.apache.org
Cc: 
Sent: Thursday, 1 March 2012, 9:39
Subject: ConjunctionScorer.doNext() overstays?

Due to the odd behaviour of a custom Scorer of mine I discovered 
ConjunctionScorer.doNext() could loop indefinitely.
It does not bail out as soon as any scorer.advance() call it makes reports back 
NO_MORE_DOCS. Is there not a performance optimisation to be gained in exiting 
as soon as this happens?
At this stage I cannot see any point in continuing to advance other scorers - 
a quick look at TermScorer suggests that any questionable calls made by 
ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment and 
disk will be hit as a consequence.
I added an extra condition to the while loop on the 3.5 source:

    while ((doc != NO_MORE_DOCS)   ((firstScorer = scorers[first]).docID()  
doc)) {
    
and Junit tests passed.I haven't been able to benchmark performance 
improvements but it looks like it would be sensible to make the change anyway.

Cheers,
Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Andrzej Bialecki (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated LUCENE-3836:
--

Attachment: LUCENE-3836.patch

Patch that implements the change. If there are no objections I'd like to commit 
this soon.

 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12571 - Still Failing

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12571/

All tests passed

Build Log (for compile errors):
[...truncated 12267 lines...]
check-memory-uptodate:

jar-memory:

check-misc-uptodate:

jar-misc:

check-spatial-uptodate:

jar-spatial:

check-grouping-uptodate:

jar-grouping:

check-queries-uptodate:

jar-queries:

check-queryparser-uptodate:

jar-queryparser:

prep-lucene-jars:

common.init:

compile-lucene-core:

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[javac] Compiling 1 source file to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/core/classes/java

compile-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java
[javac] Compiling 8 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21:
 package org.apache.lucene.analysis.uima does not exist
[javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21:
 package org.apache.lucene.analysis.uima does not exist
[javac] import 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26:
 package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProvider;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27:
 package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51:
 cannot find symbol
[javac] symbol  : class AEProvider
[javac] location: class 
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor
[javac]   private AEProvider aeProvider;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44:
 cannot find symbol
[javac] symbol  : class UIMAAnnotationsTokenizer
[javac] location: class 
org.apache.solr.uima.analysis.UIMAAnnotationsTokenizerFactory
[javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, 
input);
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46:
 cannot find symbol
[javac] symbol  : class UIMATypeAwareAnnotationsTokenizer
[javac] location: class 
org.apache.solr.uima.analysis.UIMATypeAwareAnnotationsTokenizerFactory
[javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, 
tokenType, featurePath, input);
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64:
 cannot find symbol
[javac] symbol  : variable AEProviderFactory
[javac] location: class 
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor
[javac] aeProvider = 
AEProviderFactory.getInstance().getAEProvider(solrCore.getName(),
[javac]  ^
[javac] 8 errors
[...truncated 14 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3767) Explore streaming Viterbi search in Kuromoji

2012-03-01 Thread Christian Moen (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen reassigned LUCENE-3767:
--

Assignee: Christian Moen  (was: Michael McCandless)

 Explore streaming Viterbi search in Kuromoji
 

 Key: LUCENE-3767
 URL: https://issues.apache.org/jira/browse/LUCENE-3767
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Michael McCandless
Assignee: Christian Moen
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3767.patch, LUCENE-3767.patch, LUCENE-3767.patch, 
 LUCENE-3767.patch, LUCENE-3767.patch, SolrXml-5498.xml, compound_diffs.txt


 I've been playing with the idea of changing the Kuromoji viterbi
 search to be 2 passes (intersect, backtrace) instead of 4 passes
 (break into sentences, intersect, score, backtrace)... this is very
 much a work in progress, so I'm just getting my current state up.
 It's got tons of nocommits, doesn't properly handle the user dict nor
 extended modes yet, etc.
 One thing I'm playing with is to add a double backtrace for the long
 compound tokens, ie, instead of penalizing these tokens so that
 shorter tokens are picked, leave the scores unchanged but on backtrace
 take that penalty and use it as a threshold for a 2nd best
 segmentation...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220003#comment-13220003
 ] 

Robert Muir commented on LUCENE-3836:
-

I think this change is OK: I just want to mention that avoiding 
SegmentReadState 
was definitely intentional... well most of my issues are really based on
SegmentWriteState, but I think the whole concept is broken, see below:

SegmentWriteState is bad news, for many codec APIs
they would be underpopulated, or even have bogus data!

For example, what would be SegmentWriteState.numDocs for StoredFieldsWriter?

I understand that at a glance having foo(A) where A has A.B and A.C and A.D 
seems simpler than foo(B, C),
but I think its confusing to pass A at all if there is an A.D thats somehow 
bogus, invalid, etc.

In that case its actually much clearer to pass B and C directly... personally I 
think we 
should revisit these 'argument holder' APIs and likely remove them completely.

Because of that: for most codec APIs I avoided SegmentWriteState and also 
SegmentReadState (for symmetry).

 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

2012-03-01 Thread Mike Spencer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Spencer updated SOLR-3185:
---

Description: 
Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) with 
'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis 
will give the expected result of 'AB'. I examined the index with both 
standalone Luke and the schema browser field and the index value is incorrect 
in both tools.

This is the affected charFilter:
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(^\w)\s[amp;]\s(\w)
replacement=$1amp;$2 /

  was:
Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) with 
'AB' (no spaces) will result in 'Aamp;B' being indexed. Query analysis will 
give the expected result of 'AB'. I examined the index with both standalone 
Luke and the schema browser field and the index value is incorrect in both 
tools.

This is the affected charFilter:
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(^\w)\s[amp;]\s(\w)
replacement=$1amp;$2 /


 PatternReplaceCharFilterFactory can't replace with ampersands in index
 --

 Key: SOLR-3185
 URL: https://issues.apache.org/jira/browse/SOLR-3185
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.5
Reporter: Mike Spencer
Priority: Minor
  Labels: PatternReplaceCharFilter, regex

 Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) 
 with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query 
 analysis will give the expected result of 'AB'. I examined the index with 
 both standalone Luke and the schema browser field and the index value is 
 incorrect in both tools.
 This is the affected charFilter:
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=(^\w)\s[amp;]\s(\w)
 replacement=$1amp;$2 /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

2012-03-01 Thread Mike Spencer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Spencer updated SOLR-3185:
---

Description: 
Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) with 
'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis 
will give the expected result of 'AB'. I examined the index with both 
standalone Luke and the schema browser field and the index value is incorrect 
in both tools.

This is the affected charFilter:
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(^\w)\s[amp;]\s(\w)
replacement=$1amp;amp;$2 /

  was:
Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) with 
'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis 
will give the expected result of 'AB'. I examined the index with both 
standalone Luke and the schema browser field and the index value is incorrect 
in both tools.

This is the affected charFilter:
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(^\w)\s[amp;]\s(\w)
replacement=$1amp;$2 /


 PatternReplaceCharFilterFactory can't replace with ampersands in index
 --

 Key: SOLR-3185
 URL: https://issues.apache.org/jira/browse/SOLR-3185
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.5
Reporter: Mike Spencer
Priority: Minor
  Labels: PatternReplaceCharFilter, regex

 Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) 
 with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query 
 analysis will give the expected result of 'AB'. I examined the index with 
 both standalone Luke and the schema browser field and the index value is 
 incorrect in both tools.
 This is the affected charFilter:
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=(^\w)\s[amp;]\s(\w)
 replacement=$1amp;amp;$2 /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2438) Case Insensitive Search for Wildcard Queries

2012-03-01 Thread Erick Erickson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2438?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220014#comment-13220014
 ] 

Erick Erickson commented on SOLR-2438:
--

Duto:

A couple of things. First, in the future could you post this kind of usage 
question to the users list? See: http://lucene.apache.org/solr/discussion.html. 
No big deal, but that way more people see the discussion and benefit.

But to your question:
Have you enabled leading wildcard? See the ReversedWildcardFilterFactory. 
Leading wildcards need some special handling because in the simple case, 
finding them means you have to examine every term in the field and can be very 
expensive.

Second, you could get away with just using one analyzer since they're all the 
same, as
analyzer
.
.
.
/analyzer

if no 'type=...' is specified, then the index and query and multiterm chains 
are use the analyzer definition.

I doubt this issue is related to this JIRA, I think it's just the normal 
leading wildcard issues.

Here's a discussion of this in some detail if you haven't seen it yet:
http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/


Erick

 Case Insensitive Search for Wildcard Queries
 

 Key: SOLR-2438
 URL: https://issues.apache.org/jira/browse/SOLR-2438
 Project: Solr
  Issue Type: Improvement
Reporter: Peter Sturge
Assignee: Erick Erickson
 Fix For: 3.6, 4.0

 Attachments: SOLR-2438-3x.patch, SOLR-2438-3x.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, SOLR-2438.patch, 
 SOLR-2438_3x.patch


 This patch adds support to allow case-insensitive queries on wildcard 
 searches for configured TextField field types.
 This patch extends the excellent work done Yonik and Michael in SOLR-219.
 The approach here is different enough (imho) to warrant a separate JIRA issue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3011) DIH MultiThreaded bug

2012-03-01 Thread Wenca Petr (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220016#comment-13220016
 ] 

Wenca Petr commented on SOLR-3011:
--

Hi Mikhail,
I know about 2804, I solved it by disabling logging as someone adviced (I 
think).

Without multithreading a was able to index about 15k documents per minute, with 
4 threads average about 45k per minute. After applying your patch it seems to 
me that it fell to 30k per minute. But the number of processed documents is 
wrong. I have 5 documents to be indexed. I start a full dump, it precesses 
about 44k documents during the first minute, but it continues after 50k to 
total 200k of processed with decreasing number of docs per minute with total 
time of more than 7 minutes. After the commit the index contains 50k documents 
which is right.

 DIH MultiThreaded bug
 -

 Key: SOLR-3011
 URL: https://issues.apache.org/jira/browse/SOLR-3011
 Project: Solr
  Issue Type: Sub-task
  Components: contrib - DataImportHandler
Affects Versions: 3.5, 4.0
Reporter: Mikhail Khludnev
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3011.patch, SOLR-3011.patch


 current DIH design is not thread safe. see last comments at SOLR-2382 and 
 SOLR-2947. I'm going to provide the patch makes DIH core threadsafe. Mostly 
 it's a SOLR-2947 patch from 28th Dec. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ConjunctionScorer.doNext() overstays?

2012-03-01 Thread Michael McCandless
Hmm, the tradeoff is an added per-hit check (doc != NO_MORE_DOCS), vs
the one-time cost at the end of calling advance(NO_MORE_DOCS) for each
sub-clause?  I think in general this isn't a good tradeoff?

Ie what about the case where we and high-freq, and similarly freq'd,
terms together?  Then, the per-hit check will at some point dominate?

It's valid to pass NO_MORE_DOCS to DocsEnum.advance.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 1, 2012 at 7:22 AM, mark harwood markharw...@yahoo.co.uk wrote:
 I got round to some benchmarking of this change on Wikipedia content which 
 shows a small improvement:   http://goo.gl/60wJG

 Aside from the small performance gain to be had, it just feels more logical 
 if ConjunctionScorer does not issue sub scorers with a request to advance to 
 NO_MORE_DOCS.




 - Original Message -
 From: mark harwood markharw...@yahoo.co.uk
 To: dev@lucene.apache.org dev@lucene.apache.org
 Cc:
 Sent: Thursday, 1 March 2012, 9:39
 Subject: ConjunctionScorer.doNext() overstays?

 Due to the odd behaviour of a custom Scorer of mine I discovered 
 ConjunctionScorer.doNext() could loop indefinitely.
 It does not bail out as soon as any scorer.advance() call it makes reports 
 back NO_MORE_DOCS. Is there not a performance optimisation to be gained in 
 exiting as soon as this happens?
 At this stage I cannot see any point in continuing to advance other scorers - 
 a quick look at TermScorer suggests that any questionable calls made by 
 ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment 
 and disk will be hit as a consequence.
 I added an extra condition to the while loop on the 3.5 source:

     while ((doc != NO_MORE_DOCS)   ((firstScorer = scorers[first]).docID() 
  doc)) {

 and Junit tests passed.I haven't been able to benchmark performance 
 improvements but it looks like it would be sensible to make the change anyway.

 Cheers,
 Mark

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ConjunctionScorer.doNext() overstays?

2012-03-01 Thread mark harwood
I would have assumed the many int comparisons would cost less than the 
superfluous disk accesses? (I bow to your considerable experience in this area!)
What is the worst-case scenario on added disk reads? Could it be as bad 
as numberOfSegments x numberOfOtherscorers before the query winds up?
On the index I tried, it looked like an improvement - the spreadsheet I linked 
to has the source for the benchmark on a second worksheet if you want to give 
it a whirl on a different dataset.



- Original Message -
From: Michael McCandless luc...@mikemccandless.com
To: dev@lucene.apache.org; mark harwood markharw...@yahoo.co.uk
Cc: 
Sent: Thursday, 1 March 2012, 13:31
Subject: Re: ConjunctionScorer.doNext() overstays?

Hmm, the tradeoff is an added per-hit check (doc != NO_MORE_DOCS), vs
the one-time cost at the end of calling advance(NO_MORE_DOCS) for each
sub-clause?  I think in general this isn't a good tradeoff?

Ie what about the case where we and high-freq, and similarly freq'd,
terms together?  Then, the per-hit check will at some point dominate?

It's valid to pass NO_MORE_DOCS to DocsEnum.advance.

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 1, 2012 at 7:22 AM, mark harwood markharw...@yahoo.co.uk wrote:
 I got round to some benchmarking of this change on Wikipedia content which 
 shows a small improvement:   http://goo.gl/60wJG

 Aside from the small performance gain to be had, it just feels more logical 
 if ConjunctionScorer does not issue sub scorers with a request to advance to 
 NO_MORE_DOCS.




 - Original Message -
 From: mark harwood markharw...@yahoo.co.uk
 To: dev@lucene.apache.org dev@lucene.apache.org
 Cc:
 Sent: Thursday, 1 March 2012, 9:39
 Subject: ConjunctionScorer.doNext() overstays?

 Due to the odd behaviour of a custom Scorer of mine I discovered 
 ConjunctionScorer.doNext() could loop indefinitely.
 It does not bail out as soon as any scorer.advance() call it makes reports 
 back NO_MORE_DOCS. Is there not a performance optimisation to be gained in 
 exiting as soon as this happens?
 At this stage I cannot see any point in continuing to advance other scorers - 
 a quick look at TermScorer suggests that any questionable calls made by 
 ConjunctionScorer to advance to NO_MORE_DOCS receives no special treatment 
 and disk will be hit as a consequence.
 I added an extra condition to the while loop on the 3.5 source:

     while ((doc != NO_MORE_DOCS)   ((firstScorer = scorers[first]).docID() 
  doc)) {

 and Junit tests passed.I haven't been able to benchmark performance 
 improvements but it looks like it would be sensible to make the change anyway.

 Cheers,
 Mark

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3181) New Admin UI, allow user to somehow cut/paste all the old Zookeeper info.

2012-03-01 Thread Stefan Matheis (steffkes) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3181:


Attachment: SOLR-3181.patch

Hm, about something like that? We could allow {{?dump=true}} as Param for the 
ZookeeperServlet, reuse {{printZnode()}} which is already used for showing the 
Details

(Yes, the Output contains actually escaped quotes, because the Change from 
SOLR-3162 is pending)

 New Admin UI, allow user to somehow cut/paste all the old Zookeeper info.
 ---

 Key: SOLR-3181
 URL: https://issues.apache.org/jira/browse/SOLR-3181
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
 Environment: n/a
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Minor
 Attachments: SOLR-3181.patch


 When tracking down issues with ZK, the devs ask about various bits of data 
 from the cloud pages. It would be convenient to be able to just capture all 
 the data from the old /solr/admin/zookeeper.jsp page in the admin interface 
 to be able to send it to anyone debugging the info. 
 Perhaps just a get debug info for Apache. Or even more cool copy debug 
 info to clipboard if that's possible. Is this just the raw data that the 
 cloud view is manipulating? It doesn't have to be pretty although indentation 
 would be nice.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3174) Visualize Cluster State

2012-03-01 Thread Stefan Matheis (steffkes) (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220053#comment-13220053
 ] 

Stefan Matheis (steffkes) commented on SOLR-3174:
-

I'll try to launch a small Cloud on my local VMWare and build an example w/ 
each of these libraries .. so we'll see which fits our requirements best - will 
need your input on this, for sure ; 

 Visualize Cluster State
 ---

 Key: SOLR-3174
 URL: https://issues.apache.org/jira/browse/SOLR-3174
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley

 It would be great to visualize the cluster state in the new UI. 
 See Mark's wish:
 https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220058#comment-13220058
 ] 

Michael McCandless commented on LUCENE-3836:


I agree catch-all argument holder classes are dangerous... they can bloat 
over time and probably lead to bugs...

 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing

2012-03-01 Thread Steven A Rowe
Tomasso, it looks like the solr/contrib/uima/ build is broken?

-Original Message-
From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] 
Sent: Thursday, March 01, 2012 7:16 AM
To: dev@lucene.apache.org
Subject: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still 
Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1846/

All tests passed

Build Log (for compile errors):
[...truncated 14441 lines...]

check-misc-uptodate:

jar-misc:

check-spatial-uptodate:

jar-spatial:

check-grouping-uptodate:

jar-grouping:

check-queries-uptodate:

jar-queries:

check-queryparser-uptodate:

jar-queryparser:

prep-lucene-jars:

common.init:

compile-lucene-core:

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[javac] Compiling 1 source file to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/core/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 1 warning

compile-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java
[javac] Compiling 8 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction with 
-source 1.6
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21:
 error: package org.apache.lucene.analysis.uima does not exist

[javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21:
 error: package org.apache.lucene.analysis.uima does not exist
[javac] import 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26:
 error: package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProvider;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27:
 error: package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51:
 error: cannot find symbol
[javac]   private AEProvider aeProvider;
[javac]   ^
[javac]   symbol:   class AEProvider
[javac]   location: class UIMAUpdateRequestProcessor
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44:
 error: cannot find symbol
[javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, 
input);
[javac]^
[javac]   symbol:   class UIMAAnnotationsTokenizer
[javac]   location: class UIMAAnnotationsTokenizerFactory
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46:
 error: cannot find symbol
[javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, 
tokenType, featurePath, input);
[javac]^
[javac]   symbol:   class UIMATypeAwareAnnotationsTokenizer
[javac]   location: class UIMATypeAwareAnnotationsTokenizerFactory
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64:
 error: cannot find symbol
[javac] aeProvider = 

Re: ConjunctionScorer.doNext() overstays?

2012-03-01 Thread Michael McCandless
On Thu, Mar 1, 2012 at 8:49 AM, mark harwood markharw...@yahoo.co.uk wrote:
 I would have assumed the many int comparisons would cost less than the 
 superfluous disk accesses? (I bow to your considerable experience in this 
 area!)
 What is the worst-case scenario on added disk reads? Could it be as bad 
 as numberOfSegments x numberOfOtherscorers before the query winds up?

Well, it depends -- the disk access is a one-time thing but the added
per-hit check is per-hit.  At some point it'll cross over...

I think likely the advance(NO_MORE_DOCS) will not usually hit disk:
our skipper impl fully pre-buffers (in RAM) the top skip lists I
think?  Even if we do go to disk it's likely the OS pre-cached those
bytes in its IO buffer.

 On the index I tried, it looked like an improvement - the spreadsheet I 
 linked to has the source for the benchmark on a second worksheet if you want 
 to give it a whirl on a different dataset.

Maybe try it on a more balanced case?  Ie, N high-freq terms whose
freq is close-ish?  And on slow queries (I think the results in your
spreadsheet are very fast queries right?  The slowest one was ~0.95
msec per query, if I'm reading it right?).

In general I think not slowing down the worst-case queries is much
more important that speeding up the super-fast queries.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3173) Database semantics - insert and update

2012-03-01 Thread Per Steffensen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220062#comment-13220062
 ] 

Per Steffensen commented on SOLR-3173:
--

Believe we will be able to use _version_ if:
a) There is a realtime way of getting the _version_ corresponding to a given 
id (or whatever you use as uniqueKey). Lets call this getRealtimeVersion(id)
b) The _version_ for a given id returned by getRealtimeVersion(id) never 
changes unless changes has been made to the document with that id (created, 
updated or deleted)
c) That getRealtimeVersion(id) will immediately return that new _version_ as 
soon a change has been made - no soft- or hard-commit necessary. Well that is 
the realtime part :-)
d) I will always get a negative number (hopefully always -1) from 
getRealtimeVersion(id) when calling with an id, where there is no corresponding 
document in the solr-core. No matter if there has never been such a document or 
if it has been there but has been deleted.

Can you please confirm or correct me on the above bullets, Yonik. It would also 
be very helpfull if you would provide the code for getRealtimeVersion(id), 
assuming that I am in the DirectUpdateHandler2. Thanks alot!

Guess this version-checking stuff is only necessary on primary (or master or 
whatever you call it) shards and not on replica (or slave). How do I know in 
DirectUpdateHandler2 if I am primary/master- or replica/slave-shard?

Regret a little bit the idea about different URLs stated in comment above. 
Guess I would just like to state info about the wanted semantics in the query 
in some other way. I guess it would be nice with a semantics URL-param with 
the possible values db-insert, db-update, db-update-version-checked and 
classic-solr-update:
- semantics=db-insert: Index document doc if and only if 
getRealtimeVersion(doc.id) returns -1. Else return DocumentAlreadyExist error
- semantics=db-update: Replace existing document if it exists, else return 
DocumentDoesNotExist error
- semantics=db-update-version-checked: As db-update but if _version_ on the 
provided document does not correspond to existing getRealtimeVersion(doc.id) 
return VersionConflict error
- semantics=classic-solr-update: Do exactly as update does today in Solr
classic-solr-update will be used if semantics is not specified in update 
request - it is the default. In solrconfig.xml you will be able to change 
default semantics plus provide a list of semantics that are not allowed. 

Regards, Per Steffensen

 Database semantics - insert and update
 --

 Key: SOLR-3173
 URL: https://issues.apache.org/jira/browse/SOLR-3173
 Project: Solr
  Issue Type: New Feature
  Components: update
Affects Versions: 3.5
 Environment: All
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: RDBMS, insert, nosql, uniqueKey, update
 Fix For: 4.0

   Original Estimate: 168h
  Remaining Estimate: 168h

 In order increase the ability of Solr to be used as a NoSql database (lots of 
 concurrent inserts, updates, deletes and queries in the entire lifetime of 
 the index) instead of just a search index (first: everything indexed (in one 
 thread), after: only queries), I would like Solr to support the following 
 features inspired by RDBMSs and other NoSql databases.
 * Given a solr-core with a schema containing a uniqueKey-field uniqueField 
 and a document Dold, when trying to INSERT a new document Dnew where 
 Dold.uniqueField is equal to Dnew.uniqueField, then I want a 
 DocumentAlredyExists error. If no such document Dold exists I want Dnew 
 indexed into the solr-core.
 * Given a solr-core with a schema containing a uniqueKey-field uniqueField 
 and a document Dold, when trying to UPDATE a document Dnew where 
 Dold.uniqueField is equal to Dnew.uniqueField I want Dold deleted from and 
 Dnew added to the index (just as it is today).If no such document Dold exists 
 I want nothing to happen (Dnew is not added to the index)
 The essence of this issue is to be able to state your intent (insert or 
 update) and have slightly different semantics (from each other and the 
 existing update) depending on you intent.
 The functionality provided by this issue is only really meaningfull when you 
 run with updateLog activated.
 This issue might be solved more or less at the same time as SOLR-3178, and 
 only one single SVN patch might be given to cover both issues.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To 

Re: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 - Still Failing

2012-03-01 Thread Tommaso Teofili
I removed too many lines inside its build.xml in r1295508 commit, I'm
working to fix it.
Tommaso

2012/3/1 Steven A Rowe sar...@syr.edu

 Tomasso, it looks like the solr/contrib/uima/ build is broken?

 -Original Message-
 From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
 Sent: Thursday, March 01, 2012 7:16 AM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Lucene-Solr-tests-only-trunk-java7 - Build # 1846 -
 Still Failing

 Build:
 https://builds.apache.org/job/Lucene-Solr-tests-only-trunk-java7/1846/

 All tests passed

 Build Log (for compile errors):
 [...truncated 14441 lines...]

 check-misc-uptodate:

 jar-misc:

 check-spatial-uptodate:

 jar-spatial:

 check-grouping-uptodate:

 jar-grouping:

 check-queries-uptodate:

 jar-queries:

 check-queryparser-uptodate:

 jar-queryparser:

 prep-lucene-jars:

 common.init:

 compile-lucene-core:

 jflex-uptodate-check:

 jflex-notice:

 javacc-uptodate-check:

 javacc-notice:

 init:

 clover.setup:

 clover.info:
 [echo]
 [echo]   Clover not found. Code coverage reports disabled.
 [echo]

 clover:

 common.compile-core:
[javac] Compiling 1 source file to
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/lucene/build/core/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction
 with -source 1.6
[javac] 1 warning

 compile-core:

 init:

 clover.setup:

 clover.info:
 [echo]
 [echo]   Clover not found. Code coverage reports disabled.
 [echo]

 clover:

 common.compile-core:
[mkdir] Created dir:
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java
[javac] Compiling 8 source files to
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/build/contrib/solr-uima/classes/java
[javac] warning: [options] bootstrap class path not set in conjunction
 with -source 1.6
[javac]
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21:
 error: package org.apache.lucene.analysis.uima does not exist

[javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer;
[javac]   ^
[javac]
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21:
 error: package org.apache.lucene.analysis.uima does not exist
[javac] import
 org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer;
[javac]   ^
[javac]
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26:
 error: package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProvider;
[javac]  ^
[javac]
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27:
 error: package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
[javac]  ^
[javac]
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51:
 error: cannot find symbol
[javac]   private AEProvider aeProvider;
[javac]   ^
[javac]   symbol:   class AEProvider
[javac]   location: class UIMAUpdateRequestProcessor
[javac]
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44:
 error: cannot find symbol
[javac] return new UIMAAnnotationsTokenizer(descriptorPath,
 tokenType, input);
[javac]^
[javac]   symbol:   class UIMAAnnotationsTokenizer
[javac]   location: class UIMAAnnotationsTokenizerFactory
[javac]
 /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk-java7/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46:
 error: cannot find symbol
[javac] return new
 UIMATypeAwareAnnotationsTokenizer(descriptorPath, tokenType, featurePath,
 input);
[javac]^
[javac]   symbol:   class UIMATypeAwareAnnotationsTokenizer
[javac]   location: class UIMATypeAwareAnnotationsTokenizerFactory
[javac]
 

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 12572 - Still Failing

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/12572/

All tests passed

Build Log (for compile errors):
[...truncated 12207 lines...]
check-memory-uptodate:

jar-memory:

check-misc-uptodate:

jar-misc:

check-spatial-uptodate:

jar-spatial:

check-grouping-uptodate:

jar-grouping:

check-queries-uptodate:

jar-queries:

check-queryparser-uptodate:

jar-queryparser:

prep-lucene-jars:

common.init:

compile-lucene-core:

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[javac] Compiling 1 source file to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/core/classes/java

compile-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java
[javac] Compiling 8 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/build/contrib/solr-uima/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:21:
 package org.apache.lucene.analysis.uima does not exist
[javac] import org.apache.lucene.analysis.uima.UIMAAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:21:
 package org.apache.lucene.analysis.uima does not exist
[javac] import 
org.apache.lucene.analysis.uima.UIMATypeAwareAnnotationsTokenizer;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:26:
 package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProvider;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:27:
 package org.apache.lucene.analysis.uima.ae does not exist
[javac] import org.apache.lucene.analysis.uima.ae.AEProviderFactory;
[javac]  ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:51:
 cannot find symbol
[javac] symbol  : class AEProvider
[javac] location: class 
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor
[javac]   private AEProvider aeProvider;
[javac]   ^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMAAnnotationsTokenizerFactory.java:44:
 cannot find symbol
[javac] symbol  : class UIMAAnnotationsTokenizer
[javac] location: class 
org.apache.solr.uima.analysis.UIMAAnnotationsTokenizerFactory
[javac] return new UIMAAnnotationsTokenizer(descriptorPath, tokenType, 
input);
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/analysis/UIMATypeAwareAnnotationsTokenizerFactory.java:46:
 cannot find symbol
[javac] symbol  : class UIMATypeAwareAnnotationsTokenizer
[javac] location: class 
org.apache.solr.uima.analysis.UIMATypeAwareAnnotationsTokenizerFactory
[javac] return new UIMATypeAwareAnnotationsTokenizer(descriptorPath, 
tokenType, featurePath, input);
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java:64:
 cannot find symbol
[javac] symbol  : variable AEProviderFactory
[javac] location: class 
org.apache.solr.uima.processor.UIMAUpdateRequestProcessor
[javac] aeProvider = 
AEProviderFactory.getInstance().getAEProvider(solrCore.getName(),
[javac]  ^
[javac] 8 errors
[...truncated 14 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Andrzej Bialecki (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220071#comment-13220071
 ] 

Andrzej Bialecki  commented on LUCENE-3836:
---

I hear you .. SegmentWriteState is bad, I agree. But the argument about 
SegmentWriteState is not really applicable to SegmentReadState - write state is 
mutable and can change under your feet whereas SegmentReadState is immutable, 
created once in SegmentReader and used only for initialization of format 
readers. On the other hand, if we insist that we always pass individual 
arguments around then providing some additional segment-global context to 
format readers requires changing method signatures (adding arguments).

The background for this issue is that I started looking at updateable fields, 
where updates are put in a segment (or reader) of its own and they provide an 
overlay for the main segment, with a special codec magic to pull and remap 
data from the overlay as the main data is accessed. However, in order to do 
that I need to provide this data when format readers are initialized. I can't 
do this when creating a Codec instance (Codec is automatically instantiated) or 
when creating Codec.*Format(), because format instances are usually shared as 
well.

So the idea I had in mind was to use SegmentReaderState uniformly, and put this 
overlay data in SegmentReadState so that it's passed down to formats during 
format readers' creation. I'm open to other ideas... :)

 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3174) Visualize Cluster State

2012-03-01 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220076#comment-13220076
 ] 

Mark Miller commented on SOLR-3174:
---

If you are on a unix machine, then in /solr/cloud-dev you could just run 
solrcloud-start.sh and it starts up a 2 shard, 4 node cluster automatically. 
Unfortunately, no windows bat files currently :(

 Visualize Cluster State
 ---

 Key: SOLR-3174
 URL: https://issues.apache.org/jira/browse/SOLR-3174
 Project: Solr
  Issue Type: New Feature
Reporter: Ryan McKinley

 It would be great to visualize the cluster state in the new UI. 
 See Mark's wish:
 https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220082#comment-13220082
 ] 

Michael McCandless commented on LUCENE-3836:


{quote}
The background for this issue is that I started looking at updateable fields, 
where updates are put in a segment (or reader) of its own and they provide an 
overlay for the main segment, with a special codec magic to pull and remap 
data from the overlay as the main data is accessed. However, in order to do 
that I need to provide this data when format readers are initialized. I can't 
do this when creating a Codec instance (Codec is automatically instantiated) or 
when creating Codec.*Format(), because format instances are usually shared as 
well.
{quote}

Sweet!

Couldn't the stacking/overlaying live above codec?  Eg, the codec thinks it's 
reading 3 segments, but really the code above knows there's 1 base segment and 
2 stacked on top?

 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Andrzej Bialecki (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220093#comment-13220093
 ] 

Andrzej Bialecki  commented on LUCENE-3836:
---

I think this could work, too - I would instantiate the overlay data in 
SegmentReader, and then I'd create the overlay codec's format readers in 
SegmentReader, using the original format readers plus the overlay data. I'll 
try this approach ... I'll create a separate issue to discuss this.

Let's close this as won't fix for now.

 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Merging 3.0.3 into Trunk

2012-03-01 Thread Stefan Bodewig
On 2012-02-29, Stefan Bodewig wrote:

 On 2012-02-28, Christopher Currens wrote:

 Alright, it's done!  3.0.3 is now merged in with Trunk!

 I'll see to running RAT and looking at the line-ends over the next few
 days so we can get them fixed once and not run into it with the release.

I went for EOLs first and there are 621 files outside of lib and doc
that need to be fixed.  What I have now is not just a patch (of more
than 200k lines), but also a list of 621 files that need their
svn:eol-style property to be set.

I can create a JIRA ticket for that attaching my patch and the list of
files to fix or - since I technically am a committer - could just commit
my cleaned up workspace as is (plus JIRA ticket that I'd open and close
myself).

What would you prefer?

RAT doesn't really make sense before the line feeds are correct (I've
seen quite a few files without license headers by manual inspection).

Stefan


[jira] [Issue Comment Edited] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Andrzej Bialecki (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220093#comment-13220093
 ] 

Andrzej Bialecki  edited comment on LUCENE-3836 at 3/1/12 3:22 PM:
---

I think this could work, too - I would instantiate the overlay data in 
SegmentReader, and then I'd create the overlay codec's format readers in 
SegmentReader, using the original format readers plus the overlay data. I'll 
try this approach ... I'll create a separate issue to discuss this.

(The reason I'm doing this at the codec level is that I wanted to avoid heavy 
mods to SegmentReader, and it's easier to visualize how this data is re-mapped 
and stacked at the level of fairly simple codec APIs).

Let's close this as won't fix for now.

  was (Author: ab):
I think this could work, too - I would instantiate the overlay data in 
SegmentReader, and then I'd create the overlay codec's format readers in 
SegmentReader, using the original format readers plus the overlay data. I'll 
try this approach ... I'll create a separate issue to discuss this.

Let's close this as won't fix for now.
  
 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220105#comment-13220105
 ] 

Robert Muir commented on LUCENE-3836:
-

{quote}
(The reason I'm doing this at the codec level is that I wanted to avoid heavy 
mods to SegmentReader, and it's easier to visualize how this data is re-mapped 
and stacked at the level of fairly simple codec APIs).
{quote}

But SegmentReader is fairly simple these days, its just basically a pointer to 
a core (SegmentCoreReaders) + deletes.

Maybe it should stay the same, but instead we could have a StackedReader 
(perhaps a bad name), that points to multiple cores + deletes + mask files or 
whatever it needs and returns masked enums over the underlying Enums itself 
(e.g. combining enums from the underlying impls, passing masks down as Bits, 
and such). SegmentReader would stay as-is.


 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #407: POMs out of sync

2012-03-01 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/407/

No tests ran.

Build Log (for compile errors):
[...truncated 20066 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Andrzej Bialecki (Created) (JIRA)
A modest proposal for updateable fields
---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 


I'd like to propose a simple design for implementing updateable fields in 
Lucene. This design has some limitations, so I'm not claiming it will be 
appropriate for every use case, and it's obvious it has some performance 
consequences, but at least it's a start...

This proposal uses a concept of overlays or stacked updates, where the 
original data is not removed but instead it's overlaid with the new data. I 
propose to reuse as much of the existing APIs as possible, and represent 
updates as an IndexReader. Updates to documents in a specific segment would be 
collected in an overlay index specific to that segment, i.e. there would be 
as many overlay indexes as there are segments in the primary index. 

A field update would be represented as a new document in the overlay index . 
The document would consist of just the updated fields, plus a field that 
records the id in the primary segment of the document affected by the update. 
These updates would be processed as usual via secondary IndexWriter-s, as many 
as there are primary segments, so the same analysis chains would be used, the 
same field types, etc.

On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
would check for the presence of the overlay index, and if so it would open it 
first (as an AtomicReader? or it would open individual codec format readers? 
perhaps it should load the whole thing into memory?), and it would construct an 
in-memory map between the primary's docId-s and the overlay's docId-s. And 
finally it would wrap the original format readers with overlay readers, 
initialized also with the id map.

Now, when consumers of the 4D API would ask for specific data, the overlay 
readers would first re-map the primary's docId to the overlay's docId, and 
check whether overlay data exists for that docId and this type of data (e.g. 
postings, stored fields, vectors) and return this data instead of the original. 
Otherwise they would return the original data.

One obvious performance issue with this appraoch is that the sequential access 
to primary data would translate into random access to the overlay data. This 
could be solved by sorting the overlay index so that at least the overlay ids 
increase monotonically as primary ids do.

Updates to the primary index would be handled as usual, i.e. segment merges, 
since the segments with updates would pretend to have no overlays) would just 
work as usual, only the overlay index would have to be deleted once the primary 
segment is deleted after merge.

Updates to the existing documents that already had some fields updated would be 
again handled as usual, only underneath they would open an IndexWriter on the 
overlay index for a specific segment.

That's the broad idea. Feel free to pipe in - I started some coding at the 
codec level but got stuck using the approach in LUCENE-3836. The approach that 
uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3836) Most Codec.*Format().*Reader() methods should use SegmentReadState

2012-03-01 Thread Andrzej Bialecki (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  resolved LUCENE-3836.
---

Resolution: Won't Fix

Thanks for the insightful comments - this looks promising. I opened LUCENE-3837 
to discuss a broader design for updateable fields.

 Most Codec.*Format().*Reader() methods should use SegmentReadState
 --

 Key: LUCENE-3836
 URL: https://issues.apache.org/jira/browse/LUCENE-3836
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/codecs
Reporter: Andrzej Bialecki 
Assignee: Andrzej Bialecki 
 Fix For: 4.0

 Attachments: LUCENE-3836.patch


 Codec formats API for opening readers is inconsistent - sometimes it uses 
 SegmentReadState, in other cases it uses individual arguments that are 
 already available via SegmentReadState. This complicates extending the API, 
 e.g. if additional per-segment state would need to be passed to the readers.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ConjunctionScorer.doNext() overstays?

2012-03-01 Thread mark harwood
Fair points.
I've tried several sized indexes and blends of query term frequencies now and 
the results swing only marginally between the 2 implementations.
Sometimes the exiting early logic is marginally faster and other times 
marginally slower. Using a larger index seemed to reduce the improvement I had 
seen on my initial results.

So overall, not a clear improvement and not worth bothering with because, as 
you suggest, various disk caching strategies probably mitigate the cost of the 
added reads.

Based on your comments re the added int comparison cost in that hot loop it 
made me think that the abstract docIdSetIterator.docId() method call could be 
questioned on that basis too?
It looks like all DocIdSetIterator subclasses maintain a doc variable mutated 
elsewhere in advance() and next() calls and docID() is meant to be idempotent 
so presumably a shared variable in the base class could avoid a docID() method 
invocation? 
Anyhoo the profiler did not show that method up as any sort of hotspot so I 
don't think it's an issue.


Thanks, Mike.




- Original Message -
From: Michael McCandless luc...@mikemccandless.com
To: dev@lucene.apache.org; mark harwood markharw...@yahoo.co.uk
Cc: 
Sent: Thursday, 1 March 2012, 14:18
Subject: Re: ConjunctionScorer.doNext() overstays?

On Thu, Mar 1, 2012 at 8:49 AM, mark harwood markharw...@yahoo.co.uk wrote:
 I would have assumed the many int comparisons would cost less than the 
 superfluous disk accesses? (I bow to your considerable experience in this 
 area!)
 What is the worst-case scenario on added disk reads? Could it be as bad 
 as numberOfSegments x numberOfOtherscorers before the query winds up?

Well, it depends -- the disk access is a one-time thing but the added
per-hit check is per-hit.  At some point it'll cross over...

I think likely the advance(NO_MORE_DOCS) will not usually hit disk:
our skipper impl fully pre-buffers (in RAM) the top skip lists I
think?  Even if we do go to disk it's likely the OS pre-cached those
bytes in its IO buffer.

 On the index I tried, it looked like an improvement - the spreadsheet I 
 linked to has the source for the benchmark on a second worksheet if you want 
 to give it a whirl on a different dataset.

Maybe try it on a more balanced case?  Ie, N high-freq terms whose
freq is close-ish?  And on slow queries (I think the results in your
spreadsheet are very fast queries right?  The slowest one was ~0.95
msec per query, if I'm reading it right?).

In general I think not slowing down the worst-case queries is much
more important that speeding up the super-fast queries.

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220163#comment-13220163
 ] 

Robert Muir commented on LUCENE-3837:
-

Some concerns about scoring:

# the stats problem: maybe we should allow overlay readers to just return -1 
for docfreq? I dont like the
  situation today where preflex codec doesnt implement all the stats (the whole 
-1 situation and 'optional' stats
  is frustrating), but I think its worse to return out of bounds stuff, e.g. 
where docfreq  maxdoc. I think 
  totalTermFreq is safe to just sum up though (its wrong, but not out of 
bounds), and similarity could use
  this safely as to compute expected IDF instead. Still, this part will be 
messy, unlike the
  newer stats in 4.0, lots of code I think expects that docFreq is always 
supported. Another possibility that
  I think I like more is to treat this conceptually just like deletes in every 
way, so all stats are supported 
  but maxDoc is wrong (includes masked-away documents), then nothing is out 
of bounds. So in this case we 
  would add maxDoc(field), which is only used for scoring. For a normal reader 
this just returns maxDoc() as
  implemented today...
# the norms problem: although norms are implemented as docValues, currently all 
similarities assume that 
  getArray()/hasArray() is implemented... but here I'm not sure that would be 
the case? we 
  should probably measure if the method call really even hurts, in general its 
a burden on the codec
  I think to require that norms actually be representable as an array (maybe 
other use cases would want
  other data structures for less RAM)...

we could solve both of these issues separately and independently if we decide 
what what we want to do.


 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad 

[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Andrzej Bialecki (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220194#comment-13220194
 ] 

Andrzej Bialecki  commented on LUCENE-3837:
---

Ad 1. I don't think it's such a big deal, we already return approximate stats 
(too high counts) in presence of deletes. I think we should go all the way, at 
least initially, and ignore stats from an overlay completely, unless the data 
is present only in the overlay - e.g. for terms not present in the main index.

Ad 2. I think that if getArray() is supported then on the first call we have to 
roll-in all updates to the main array created from the primary.

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ConjunctionScorer.doNext() overstays?

2012-03-01 Thread Michael McCandless
On Thu, Mar 1, 2012 at 11:55 AM, mark harwood markharw...@yahoo.co.uk wrote:

 Based on your comments re the added int comparison cost in that hot loop it 
 made me think that the abstract docIdSetIterator.docId() method call could be 
 questioned on that basis too?
 It looks like all DocIdSetIterator subclasses maintain a doc variable mutated 
 elsewhere in advance() and next() calls and docID() is meant to be idempotent 
 so presumably a shared variable in the base class could avoid a docID() 
 method invocation?
 Anyhoo the profiler did not show that method up as any sort of hotspot so I 
 don't think it's an issue.

Maybe we could explore that?  I'm not sure about hotspot implications
though... (vs private int accessible only via getter).

Ideally, consumers of DISI should hold onto the int docID returned
from next/advance and use that... (ie, don't call docID() again,
unless it's too hard to hold onto the returned doc).

 Thanks, Mike.

Thank you!  Keep the ideas coming :)

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220207#comment-13220207
 ] 

Michael McCandless commented on LUCENE-3837:


Could we use the actual docID (ie same docID as the base segment)?  This way we 
wouldn't need the (possibly large) int[] to remap on each access.  I guess for 
postings this is OK (we can pass PostingsFormat any docIDs), but for eg stored 
fields, term vectors, doc values, it's not (they can't handle sparse docIDs).

Also, can't we directly write the stacked segments ourselves?  (Ie, within a 
single IW).

We'd need to extend SegmentInfo(s) to record which segments stack on which, and 
fix MP to understand stacking (and aggressively target the stacks).

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220208#comment-13220208
 ] 

Robert Muir commented on LUCENE-3837:
-

{quote}
Ad 1. I don't think it's such a big deal, we already return approximate stats 
(too high counts) in presence of deletes. I think we should go all the way, at 
least initially, and ignore stats from an overlay completely, unless the data 
is present only in the overlay - e.g. for terms not present in the main index.
{quote}

I disagree: it may not be a big deal for DefaultSimilarity, but its important 
for other scoring implementations. Initially its extremely important
we get this stuff right before committing anything!

Large problems can result when the statistics are inconsistent with what is 
'discovered' in the docsenum. This is because many scoring models expect
certain relationships to hold true: such as a single doc's tf value won't 
exceed totalTermFreq. We had to do significant work already to ensure
consistency, though in some cases the problems could not totally be solved 
(BasicModelD, BasicModelP, BasicModelBE+NormalizationH3, etc) and we
had to unfortunately resort to only leaving warnings in the javadocs.

I'm fairly certain in all cases we avoid things like NaN or negative scores, 
but when the function 'inverts relevance' is aweful too.

So I think we need a consistent model for stats: thats why I lean towards 
maxDoc(field), which is consistent in every way with how we handle
deletes, and it won't yield any surprises.

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is 

[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220211#comment-13220211
 ] 

Michael McCandless commented on LUCENE-3837:


I think for scoring the wrong yet consistent stats approach is good?  (Just 
like deletes).

So, an update would affect scoring (eg on update the field now has 4 
occurrences of python vs only 1 occurrence before, so now it gets a better 
score), but the scoring will not precisely match the scores I'd get from a full 
re-index instead of an update.

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Welcome Stefan Matheis

2012-03-01 Thread Erik Hatcher
Welcome, Stefan!Your UI work is definitely much appreciated and very nice 
looking.

Erik

On Feb 29, 2012, at 16:04 , Ryan McKinley wrote:

 I'm pleased to announce that Stefan Matheis has joined our ranks as a 
 committer.
 
 He has given the solr admin UI some much needed love.  It now looks
 like it belongs in 2012!
 
 Stefan, it is tradition that you introduce yourself with a brief bio.
 
 Your SVN access should be ready to go.
 
 Welcome!
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Andrzej Bialecki (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220300#comment-13220300
 ] 

Andrzej Bialecki  commented on LUCENE-3837:
---

That was my point, we should be able to come up with estimates that yield 
slightly wrong yet consistent stats. I don't know the details of new 
similarities, so it's up to you Robert to come up with suggestions :)

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Andrzej Bialecki (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220315#comment-13220315
 ] 

Andrzej Bialecki  commented on LUCENE-3837:
---

bq. Could we use the actual docID (ie same docID as the base segment)?
Updates may arrive out of order, so the updates will naturally get different 
internal IDs (also, if you wanted to use the same ids they would have gaps). I 
don't know if various parts of Lucene can handle out of order ids coming from 
iterators? If we wanted to match the ids early then we would have to sort them, 
a la IndexSorter, on every flush and on every merge, which seems too costly. 
So, a re-mapping structure seems like a decent compromise. Yes, it could be 
large - we could put artificial limits on the number of updates before we do a 
merge.

bq. Also, can't we directly write the stacked segments ourselves? (Ie, within a 
single IW).
I don't know, it didn't seem likely to me - AFAIK IW operates on a single 
segment before flushing it? And updates could refer to docs outside the current 
segment.

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Andrzej Bialecki (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220315#comment-13220315
 ] 

Andrzej Bialecki  edited comment on LUCENE-3837 at 3/1/12 8:17 PM:
---

bq. Could we use the actual docID (ie same docID as the base segment)?
Updates may arrive out of order, so the updates will naturally get different 
internal IDs (also, if you wanted to use the same ids they would have gaps). I 
don't know if various parts of Lucene can handle out of order ids coming from 
iterators? If we wanted to match the ids early then we would have to sort them, 
a la IndexSorter, on every flush and on every merge, which seems too costly. 
So, a re-mapping structure seems like a decent compromise. Yes, it could be 
large - we could put artificial limits on the number of updates before we force 
a merge.

bq. Also, can't we directly write the stacked segments ourselves? (Ie, within a 
single IW).
I don't know, it didn't seem likely to me - AFAIK IW operates on a single 
segment before flushing it? And updates could refer to docs outside the current 
segment.

  was (Author: ab):
bq. Could we use the actual docID (ie same docID as the base segment)?
Updates may arrive out of order, so the updates will naturally get different 
internal IDs (also, if you wanted to use the same ids they would have gaps). I 
don't know if various parts of Lucene can handle out of order ids coming from 
iterators? If we wanted to match the ids early then we would have to sort them, 
a la IndexSorter, on every flush and on every merge, which seems too costly. 
So, a re-mapping structure seems like a decent compromise. Yes, it could be 
large - we could put artificial limits on the number of updates before we do a 
merge.

bq. Also, can't we directly write the stacked segments ourselves? (Ie, within a 
single IW).
I don't know, it didn't seem likely to me - AFAIK IW operates on a single 
segment before flushing it? And updates could refer to docs outside the current 
segment.
  
 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the 

[jira] [Created] (SOLR-3188) New admin page: Enable Polling button disappears after disabling polling and reloading page

2012-03-01 Thread Neil Hooey (Created) (JIRA)
New admin page: Enable Polling button disappears after disabling polling and 
reloading page
-

 Key: SOLR-3188
 URL: https://issues.apache.org/jira/browse/SOLR-3188
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Neil Hooey
Priority: Minor


When you go to this URL on a slave:
http://localhost:8983/solr/#/singlecore/replication

And click the Disable Polling button, you see a red bar that says 
invalid_master. I'm not sure why I get this red bar, as I haven't tested it 
outside of my own installation, but it seems normal.

If you then refresh the page, the Replicate Now and Enable Polling buttons 
disappear. It seems like their generation is being interrupted by the 
invalid_master error.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2020) HttpComponentsSolrServer

2012-03-01 Thread Sami Siren (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sami Siren updated SOLR-2020:
-

Attachment: SOLR-2020.patch

This patch completes the conversion. 

All tests pass but there's still some cleanup work to do + couple of places 
where I cut corners.


 HttpComponentsSolrServer
 

 Key: SOLR-2020
 URL: https://issues.apache.org/jira/browse/SOLR-2020
 Project: Solr
  Issue Type: New Feature
  Components: clients - java
Affects Versions: 1.4.1
 Environment: Any
Reporter: Chantal Ackermann
Priority: Minor
 Attachments: HttpComponentsSolrServer.java, 
 HttpComponentsSolrServerTest.java, SOLR-2020-HttpSolrServer.patch, 
 SOLR-2020.patch


 Implementation of SolrServer that uses the Apache Http Components framework.
 Http Components (http://hc.apache.org/) is the successor of Commons 
 HttpClient and thus HttpComponentsSolrServer would be a successor of 
 CommonsHttpSolrServer, in the future.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: ConjunctionScorer.doNext() overstays?

2012-03-01 Thread Mark Harwood

 Ideally, consumers of DISI should hold onto the int docID returned
 from next/advance and use that... (ie, don't call docID() again,
 unless it's too hard to hold onto the returned doc).
 

Yes, I remember raising that way back when: 
https://issues.apache.org/jira/browse/LUCENE-584?focusedCommentId=12564415page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12564415

Back then Mike B raised the issue of backwards compatibility so I don't know if 
the 4.0 release presents the opportunity to revisit that idea



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Shai Erera (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220387#comment-13220387
 ] 

Shai Erera commented on LUCENE-3837:


Andrzej, this brings back [old 
memories|http://mail-archives.apache.org/mod_mbox/lucene-dev/201004.mbox/%3cu2s786fde51004250432gd50bec64m9b2f6ee6dd495...@mail.gmail.com%3E]
 :-).

The core difference in your proposal is that the updates are processed in a 
separate index, and that at runtime we use a PQ to match documents and collapse 
all the updates, right? And these updates will be reflected in the main index 
on segment merges, right?

I personally prefer a more integrated solution then one that's based on 
matching PQs, but since I barely did something with my proposal for 2 years, I 
guess that your progress is better than no progress at all.

One comment -- when the updates are collapsed, the may not just simply 
'replace' what exists before them. I could see an update to a document which 
adds a stored field, and therefore if I'll call IndexReader.document(i), I'd 
expect to see that stored field with all the ones that existed before it.

At the time I felt that modifying Lucene to add stacked segments is way too 
complicated, and the indexing internals kept changing by the day. But now 
Codecs seem to be very stable, and trunk's code changes relax, so perhaps it'll 
be worthwhile taking a second look at that proposal? (but only if you feel like 
it)

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your 

[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

2012-03-01 Thread Dawid Weiss (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220421#comment-13220421
 ] 

Dawid Weiss commented on SOLR-3185:
---

Are there any other filters in the chain? Because 
PatternReplaceCharFilterFactory itself doesn't replace any html entities so 
it'd be weird. Also, can you quote the XML verbatim? If you have this:

{noformat}
charFilter class=solr.PatternReplaceCharFilterFactory 
pattern=(^\w)\s[amp;]\s(\w) 
replacement=$1amp;amp;$2 /
{noformat}
then indeed the replaced value will be:
{noformat}
$1amp;$2
{noformat}

 PatternReplaceCharFilterFactory can't replace with ampersands in index
 --

 Key: SOLR-3185
 URL: https://issues.apache.org/jira/browse/SOLR-3185
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.5
Reporter: Mike Spencer
Priority: Minor
  Labels: PatternReplaceCharFilter, regex

 Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) 
 with 'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query 
 analysis will give the expected result of 'AB'. I examined the index with 
 both standalone Luke and the schema browser field and the index value is 
 incorrect in both tools.
 This is the affected charFilter:
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=(^\w)\s[amp;]\s(\w)
 replacement=$1amp;amp;$2 /

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3837) A modest proposal for updateable fields

2012-03-01 Thread Andrzej Bialecki (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220420#comment-13220420
 ] 

Andrzej Bialecki  commented on LUCENE-3837:
---

bq. I guess that your progress is better than no progress at all.
That's my perspective too, and it's reflected in the title of this issue... I 
remember your description and in fact my proposal is somewhat similar. It does 
not use PQs, but indeed it merges updates on the fly, at the cost of keeping a 
static map of primary-secondary ids and random seeking in the secondary index 
to retrieve matching data. Please check the description above. And then once a 
segment merge is executed the overlay data will be integrated into the main 
data, because the merge process will pull in this mix of new and old without 
being aware of it - it will be hidden by Codec's read formats. Codec 
abstractions are great for this kind of manipulations.
bq. One comment – when the updates are collapsed, the may not just simply 
'replace' what exists before them.
Right, old data will be returned if not overlaid by new data, meaning that e.g. 
old stored field values will be returned for all other fields except the 
updated field, and for that field the data from the overlay will be returned.

 A modest proposal for updateable fields
 ---

 Key: LUCENE-3837
 URL: https://issues.apache.org/jira/browse/LUCENE-3837
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/index
Affects Versions: 4.0
Reporter: Andrzej Bialecki 

 I'd like to propose a simple design for implementing updateable fields in 
 Lucene. This design has some limitations, so I'm not claiming it will be 
 appropriate for every use case, and it's obvious it has some performance 
 consequences, but at least it's a start...
 This proposal uses a concept of overlays or stacked updates, where the 
 original data is not removed but instead it's overlaid with the new data. I 
 propose to reuse as much of the existing APIs as possible, and represent 
 updates as an IndexReader. Updates to documents in a specific segment would 
 be collected in an overlay index specific to that segment, i.e. there would 
 be as many overlay indexes as there are segments in the primary index. 
 A field update would be represented as a new document in the overlay index . 
 The document would consist of just the updated fields, plus a field that 
 records the id in the primary segment of the document affected by the update. 
 These updates would be processed as usual via secondary IndexWriter-s, as 
 many as there are primary segments, so the same analysis chains would be 
 used, the same field types, etc.
 On opening a segment with updates the SegmentReader (see also LUCENE-3836) 
 would check for the presence of the overlay index, and if so it would open 
 it first (as an AtomicReader? or it would open individual codec format 
 readers? perhaps it should load the whole thing into memory?), and it would 
 construct an in-memory map between the primary's docId-s and the overlay's 
 docId-s. And finally it would wrap the original format readers with overlay 
 readers, initialized also with the id map.
 Now, when consumers of the 4D API would ask for specific data, the overlay 
 readers would first re-map the primary's docId to the overlay's docId, and 
 check whether overlay data exists for that docId and this type of data (e.g. 
 postings, stored fields, vectors) and return this data instead of the 
 original. Otherwise they would return the original data.
 One obvious performance issue with this appraoch is that the sequential 
 access to primary data would translate into random access to the overlay 
 data. This could be solved by sorting the overlay index so that at least the 
 overlay ids increase monotonically as primary ids do.
 Updates to the primary index would be handled as usual, i.e. segment merges, 
 since the segments with updates would pretend to have no overlays) would just 
 work as usual, only the overlay index would have to be deleted once the 
 primary segment is deleted after merge.
 Updates to the existing documents that already had some fields updated would 
 be again handled as usual, only underneath they would open an IndexWriter on 
 the overlay index for a specific segment.
 That's the broad idea. Feel free to pipe in - I started some coding at the 
 codec level but got stuck using the approach in LUCENE-3836. The approach 
 that uses a modified SegmentReader seems more promising.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [Lucene.Net] Merging 3.0.3 into Trunk

2012-03-01 Thread Christopher Currens
I agree with Prescott.  Make a patch for that sucker! :)


Thanks,
Christopher

On Thu, Mar 1, 2012 at 9:57 AM, Prescott Nasser geobmx...@hotmail.comwrote:

 Jira and then just submit your own patch imo

 Sent from my Windows Phone
 
 From: Stefan Bodewig
 Sent: 3/1/2012 7:23 AM
 To: lucene-net-...@incubator.apache.org
 Subject: Re: [Lucene.Net] Merging 3.0.3 into Trunk

 On 2012-02-29, Stefan Bodewig wrote:

  On 2012-02-28, Christopher Currens wrote:

  Alright, it's done!  3.0.3 is now merged in with Trunk!

  I'll see to running RAT and looking at the line-ends over the next few
  days so we can get them fixed once and not run into it with the release.

 I went for EOLs first and there are 621 files outside of lib and doc
 that need to be fixed.  What I have now is not just a patch (of more
 than 200k lines), but also a list of 621 files that need their
 svn:eol-style property to be set.

 I can create a JIRA ticket for that attaching my patch and the list of
 files to fix or - since I technically am a committer - could just commit
 my cleaned up workspace as is (plus JIRA ticket that I'd open and close
 myself).

 What would you prefer?

 RAT doesn't really make sense before the line feeds are correct (I've
 seen quite a few files without license headers by manual inspection).

 Stefan



Re: toString on Thread

2012-03-01 Thread Yonik Seeley
On Thu, Mar 1, 2012 at 5:20 PM, Dawid Weiss dawid.we...@gmail.com wrote:
 Overriding toString on a Thread is not a good idea. Can I remove it or
 at least make it simpler in ConcurrentMergeScheduler? This override
 caused a fantastic deadlock -- an interesting possibility I didn't
 think of -- again, when dumping threads (for the exception string)
 Thread.toString was invoked from what I thought was an isolated
 monitor (and it was); only toString had its own monitors underneath
 and here's what happened (simplified):

Ouch!
Now I've got to go think if we've done anything like that in Solr...

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: toString on Thread

2012-03-01 Thread Michael McCandless
Scary!  I think remove it?

Though it is nice to see what segments are being merged by the
thread... but this risk is awful.  App can turn on IW's infoStream to
see it too...

Mike McCandless

http://blog.mikemccandless.com

On Thu, Mar 1, 2012 at 5:20 PM, Dawid Weiss dawid.we...@gmail.com wrote:
 Overriding toString on a Thread is not a good idea. Can I remove it or
 at least make it simpler in ConcurrentMergeScheduler? This override
 caused a fantastic deadlock -- an interesting possibility I didn't
 think of -- again, when dumping threads (for the exception string)
 Thread.toString was invoked from what I thought was an isolated
 monitor (and it was); only toString had its own monitors underneath
 and here's what happened (simplified):

 Lucene Merge Thread #1:
        at org.apache.lucene.index.IndexWriter.segString(IndexWriter.java:3764)
        - waiting to lock L5 (a org.apache.lucene.index.IndexWriter)
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.toString(ConcurrentMergeScheduler.java:499)
        ...
        at 
 org.apache.lucene.util.LuceneTestCase.getRandom(LuceneTestCase.java:276)
        at 
 org.apache.lucene.index.TestTransactions.access$100(TestTransactions.java:33)
        at 
 org.apache.lucene.index.TestTransactions$RandomFailure.eval(TestTransactions.java:40)
        at 
 org.apache.lucene.store.MockDirectoryWrapper.maybeThrowDeterministicException(MockDirectoryWrapper.java:688)
        - locked L4 (a org.apache.lucene.store.MockDirectoryWrapper)
        at 
 org.apache.lucene.store.MockDirectoryWrapper.createOutput(MockDirectoryWrapper.java:415)
        - locked L4 (a org.apache.lucene.store.MockDirectoryWrapper)
        at 
 org.apache.lucene.codecs.lucene40.Lucene40FieldInfosWriter.write(Lucene40FieldInfosWriter.java:56)
        at 
 org.apache.lucene.index.SegmentMerger.mergeFieldInfos(SegmentMerger.java:194)
        at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:109)
        at 
 org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3623)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3257)
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)
 Lucene Merge Thread #0:
        at 
 org.apache.lucene.store.MockDirectoryWrapper.listAll(MockDirectoryWrapper.java:695)
        - waiting to lock L4 (a org.apache.lucene.store.MockDirectoryWrapper)
        at 
 org.apache.lucene.index.IndexFileDeleter.refresh(IndexFileDeleter.java:345)
        at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3272)
        - locked L5 (a org.apache.lucene.index.IndexWriter)
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:382)
        at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:451)

 A classic, isn't it?

 Dawid

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: toString on Thread

2012-03-01 Thread Dawid Weiss
 Ouch!
 Now I've got to go think if we've done anything like that in Solr...

Yeah... I honestly never thought about such possibility and I don't
think any sane person would ;)

I think this qualifies as a hack similar to the solution to this puzzler:
http://wouter.coekaerts.be/2012/puzzle-clowns

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: toString on Thread

2012-03-01 Thread Dawid Weiss
 Though it is nice to see what segments are being merged by the
 thread... but this risk is awful.  App can turn on IW's infoStream to
 see it too...

This could be possible by updating a volatile string somewhere and
only exposing it in toString override, but I don't know if this is
worth the effort. Volatile will impose an additional happens-before,
etc.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: toString on Thread

2012-03-01 Thread Dawid Weiss
 Now I've got to go think if we've done anything like that in Solr...

I did a quick check via Eclipse's Java search and it seems nothing
else overrides Thread#toString() or Thread#getName. Can't guarantee
anything, but 99% sure we're safe from this one.

Dawid

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3017) Allow edismax stopword filter factory implementation to be specified

2012-03-01 Thread Michael Dodsworth (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3017?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Dodsworth resolved SOLR-3017.
-

Resolution: Fixed

Yonik's fix resolves this. Much appreciated.

 Allow edismax stopword filter factory implementation to be specified
 

 Key: SOLR-3017
 URL: https://issues.apache.org/jira/browse/SOLR-3017
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.0
Reporter: Michael Dodsworth
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3017-without-guava-alternative.patch, 
 SOLR-3017.patch, SOLR-3017.patch, edismax_stop_filter_factory.patch


 Currently, the edismax query parser assumes that stopword filtering is being 
 done by StopFilter: the removal of the stop filter is performed by looking 
 for an instance of 'StopFilterFactory' (hard-coded) within the associated 
 field's analysis chain.
 We'd like to be able to use our own stop filters whilst keeping the edismax 
 stopword removal goodness. The supplied patch allows the stopword filter 
 factory class to be supplied as a param, stopwordFilterClassName. If no 
 value is given, the default (StopFilterFactory) is used.
 Another option I looked into was to extend StopFilterFactory to create our 
 own filter. Unfortunately, StopFilterFactory's 'create' method returns 
 StopFilter, not TokenStream. StopFilter is also final.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: toString on Thread

2012-03-01 Thread Yonik Seeley
On Thu, Mar 1, 2012 at 5:30 PM, Dawid Weiss
dawid.we...@cs.put.poznan.pl wrote:
 Ouch!
 Now I've got to go think if we've done anything like that in Solr...

 Yeah... I honestly never thought about such possibility and I don't
 think any sane person would ;)

 I think this qualifies as a hack similar to the solution to this puzzler:
 http://wouter.coekaerts.be/2012/puzzle-clowns

Cute!  Hadn't seen that.  My first thought to a solution matched the
first comment on the solutions page (no spoilers!).
I missed the more elegant  official solution.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [Lucene.Net] Official stance on API changes between major versions

2012-03-01 Thread Andy Pook
I am not a commiter but my company makes extensive use of Lucene.net. So
here's my two pennies...

I understand that there is a commendable motivation to be gentle with api
changes. Wanting to give plenty of warning by obsoleting methods.

Several points. First is that there is a change to the major version
number. Users should expect changes to the api.
Next, when this project was restarted last year the stated direction was to
get caught up with the Java version and also to move towards a more dotnet
style interface.

The discussions on the list do occasionally get bogged down in this kind of
too and fro. A coach my sports team used once said something along these
lines... If the team can't choose then no one has made a convincing
argument. So make a choice, any choice and just get on with it. If it turns
out to be the wrong choice then at least you've learnt something.
This is software. It's changeable.

My bias is that I want what's in V4 (codecs, NRT etc). I'm willing to take
some pain if it means this project can accelerate.
I would imagine that most serious uses of Lucene would be hidden within a
service or at least isolated in some way, not dotted around all over the
application. This is what isolation is for, to protect components from
change. The impact of even fairly major api changes should be quite
localised and refactorable. Intimidating, yes. More than a bit scary, of
course. But worth it for getting the newer bits.
By all means be professional, make proposals, have some discussion. But
please let's not be too conservative, too timid.

2.9.4g is a good release. We've been using it since shortly after it seemed
stable. If there are users that need some stability then they should be
advised to stick with g for a while.

Now that that is done and a hearty thank you for the work on both the code
and the Apache process. My vote would be for some more radical changes to
be allowed. Lets get through 3.0.3 and on to 3.5 and 4.0. Lets get to one
of the original goals which is functional parity with Java and lets be bold
with some of the dotnet modifications (note that being bold does not mean
that one is reckless).


I'm sure that some will say, yeah great sentiment, now send some patches. I
agree. I have sent some very minor patches previously and it frustrates me
that my company has not contributed more. We have just taken on a lot more
people so I hope that we can be more active with Lucene.net soon.

--Andy

On 28 February 2012 18:17, Christopher Currens currens.ch...@gmail.comwrote:

 I *really* don't mean to be a bother to anyone, but I'd like to continue
 work on this.  I feel that until I can get a better sense of how the group
 feels about this, I can't make much progress.  Perhaps this radio silence
 is just because this email thread got lost in among the others.

 On Fri, Feb 24, 2012 at 6:50 PM, Prescott Nasser geobmx...@hotmail.com
 wrote:

  Im not against breaking compatibility when changing the version number to
  a new major 2 - 3. Im not sure how others feel. Matching Java access
  modifiers seems like the right move.
 
  That said, what if we mark obsolete in 3.0.3 and when we make the jump to
  4.0 wipe them out? In my head we shouldn't spend too much time cleaning
 up
  3.0.3 aside from bug fixes if were just going to swap it for 4.0 in the
  near future.
 
  There has to be a break at some point, making it with a major release is
  the best place to make it.
 
  Sent from my Windows Phone
  
  From: Christopher Currens
  Sent: 2/24/2012 2:45 PM
  To: lucene-net-...@lucene.apache.org
  Subject: [Lucene.Net] Official stance on API changes between major
 versions
 
  A bit of background about what I've been doing lately on the project.
   Because we've now confirmed that the .NET 3.0.3 branch is a completed
 port
  of Java 3.0.3 version, I've been spending time trying to work on some of
  the bugs and improvements that are assigned to this version.  There
 wasn't
  any real discussion about the actual features, I just created some (based
  on mailing list discussions) and assigned them to the 3.0.3 release.  The
  improvements I've been working on lately are ones that have bugged me
  specifically since I've started using Lucene.NET.
 
  I've worked on https://issues.apache.org/jira/browse/LUCENENET-468 and
  https://issues.apache.org/jira/browse/LUCENENET-470 so far.
 
  LUCENENET-740 is pretty much completed, all of the classes that
 implemented
  Closeable() now implement IDisposable, having a public void Dispose()
  and/or protected virtual void Dispose(bool disposing), depending if the
  class is sealed or not.  What is left to do on that issue would be to
 make
  sure that all of the tests are a) overriding the protected dispose method
  as needed and b) are actually calling Dispose or are in a using
 statement.
 
  I've done quite a bit of work on LUCENENET-468, as well, though it is
 going
  far slower than 470, because there's a lot more 

[jira] [Updated] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

2012-03-01 Thread Mike Spencer (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Spencer updated SOLR-3185:
---

Description: 
Using solr.PatternReplaceCharFilterFactory to replace {noformat}A  B{noformat} 
with {noformat}AB{noformat} will result in {noformat}Aamp;B{noformat} being 
indexed. Query analysis will give the expected result of 
{noformat}AB{noformat}. I examined the index with both standalone Luke and the 
schema browser field and the index value is incorrect in both tools.

This is the affected charFilter:
{noformat}
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(^\w)\s[amp;]\s(\w)
replacement=$1amp;$2 /
{noformat}

  was:
Using solr.PatternReplaceCharFilterFactory to replace 'A  B' (no quotes) with 
'AB' (no spaces) will result in 'Aamp;amp;B' being indexed. Query analysis 
will give the expected result of 'AB'. I examined the index with both 
standalone Luke and the schema browser field and the index value is incorrect 
in both tools.

This is the affected charFilter:
charFilter class=solr.PatternReplaceCharFilterFactory
pattern=(^\w)\s[amp;]\s(\w)
replacement=$1amp;amp;$2 /


 PatternReplaceCharFilterFactory can't replace with ampersands in index
 --

 Key: SOLR-3185
 URL: https://issues.apache.org/jira/browse/SOLR-3185
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.5
Reporter: Mike Spencer
Priority: Minor
  Labels: PatternReplaceCharFilter, regex

 Using solr.PatternReplaceCharFilterFactory to replace {noformat}A  
 B{noformat} with {noformat}AB{noformat} will result in 
 {noformat}Aamp;B{noformat} being indexed. Query analysis will give the 
 expected result of {noformat}AB{noformat}. I examined the index with both 
 standalone Luke and the schema browser field and the index value is incorrect 
 in both tools.
 This is the affected charFilter:
 {noformat}
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=(^\w)\s[amp;]\s(\w)
 replacement=$1amp;$2 /
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3185) PatternReplaceCharFilterFactory can't replace with ampersands in index

2012-03-01 Thread Mike Spencer (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220605#comment-13220605
 ] 

Mike Spencer commented on SOLR-3185:


Sorry, had improper formatting before. Due to how the XML configuration needs 
to deal with ampersands I have to use the amp;amp; code instead of the 
character. It reads it fine but writes it literally instead of outputting the 
ampersand character.



 PatternReplaceCharFilterFactory can't replace with ampersands in index
 --

 Key: SOLR-3185
 URL: https://issues.apache.org/jira/browse/SOLR-3185
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 3.5
Reporter: Mike Spencer
Priority: Minor
  Labels: PatternReplaceCharFilter, regex

 Using solr.PatternReplaceCharFilterFactory to replace {noformat}A  
 B{noformat} with {noformat}AB{noformat} will result in 
 {noformat}Aamp;B{noformat} being indexed. Query analysis will give the 
 expected result of {noformat}AB{noformat}. I examined the index with both 
 standalone Luke and the schema browser field and the index value is incorrect 
 in both tools.
 This is the affected charFilter:
 {noformat}
 charFilter class=solr.PatternReplaceCharFilterFactory
 pattern=(^\w)\s[amp;]\s(\w)
 replacement=$1amp;$2 /
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module

2012-03-01 Thread Ryan McKinley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220614#comment-13220614
 ] 

Ryan McKinley commented on LUCENE-3795:
---

OK, I think the branch is ready to go.

The one thing I don't like is that the spatial4j.jar gets included twice, once 
in the modules 'lib' directory and again in the solr lib directory.  I could 
not figure out how to have the solr build compile and distribute this one

 Replace spatial contrib module with LSP's spatial-lucene module
 ---

 Key: LUCENE-3795
 URL: https://issues.apache.org/jira/browse/LUCENE-3795
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.0


 I propose that Lucene's spatial contrib module be replaced with the 
 spatial-lucene module within Lucene Spatial Playground (LSP).  LSP has been 
 in development for approximately 1 year by David Smiley, Ryan McKinley, and 
 Chris Male and we feel it is ready.  LSP is here: 
 http://code.google.com/p/lucene-spatial-playground/  and the spatial-lucene 
 module is intuitively in svn/trunk/spatial-lucene/.
 I'll add more comments to prevent the issue description from being too long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3795) Replace spatial contrib module with LSP's spatial-lucene module

2012-03-01 Thread David Smiley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220688#comment-13220688
 ] 

David Smiley commented on LUCENE-3795:
--

For those following along here, the former spatial-base module portion of 
this code is now an ASL licensed 3rd party jar dependency: http://spatial4j.com 
Spatial4J Basically half of LSP is there now going by this new name.  The 
other half is here as the new lucene spatial module.

I agree that the branch looks ready to be merged into trunk.

 Replace spatial contrib module with LSP's spatial-lucene module
 ---

 Key: LUCENE-3795
 URL: https://issues.apache.org/jira/browse/LUCENE-3795
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/spatial
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 4.0


 I propose that Lucene's spatial contrib module be replaced with the 
 spatial-lucene module within Lucene Spatial Playground (LSP).  LSP has been 
 in development for approximately 1 year by David Smiley, Ryan McKinley, and 
 Chris Male and we feel it is ready.  LSP is here: 
 http://code.google.com/p/lucene-spatial-playground/  and the spatial-lucene 
 module is intuitively in svn/trunk/spatial-lucene/.
 I'll add more comments to prevent the issue description from being too long.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3060) add highlighter support to SurroundQParserPlugin

2012-03-01 Thread Shalu Singh (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13220696#comment-13220696
 ] 

Shalu Singh commented on SOLR-3060:
---

Hi Ahmet, i am trying to include the SOLR-2703.patch into SOLR 3.5 downloaded 
from SVN branches to provide Surround parser. But it is not working after 
including the 2703 SOLR patch. Do u know how to apply the same??

 add highlighter support to  SurroundQParserPlugin
 -

 Key: SOLR-3060
 URL: https://issues.apache.org/jira/browse/SOLR-3060
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 4.0
Reporter: Ahmet Arslan
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3060.patch, SOLR-3060.patch


 Highlighter does not recognize SrndQuery family.
 http://search-lucene.com/m/FuDsU1sTjgM
 http://search-lucene.com/m/wD8c11gNTb61

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org