Re: dismax vs edismax

2012-11-27 Thread Jack Krupansky
My view is that if we simply added an option to edismax to restrict the 
syntax to the very limited syntax of dismax, then we could have one, common 
xdismax query parser.


And then, why not simply rename the current Solr query parser to "classic" 
and make the new xdismax be the default Solr query parser.


And then... push a lot of the so-called "Solr-specific" features down into 
the Lucene query parser (abstracting away the specifics of Solr schema, Solr 
plugin, Solr parameter format, etc.) and then we can have one, unified query 
parser for Lucene and Solr. But... not everyone is persuaded!


-- Jack Krupansky

-Original Message- 
From: David Smiley (@MITRE.org)

Sent: Tuesday, November 27, 2012 11:43 PM
To: dev@lucene.apache.org
Subject: dismax vs edismax

It was my hope that by now, the dismax & edismax distinction would be a 
thing

of the past, such that we'd simply call this by one name, simply "dismax".

From memories of various JIRA commentary, Jan wants this too and made great

progress enhancing edismax, but Hoss pushed back on edismax overtaking
dismax as "the" one new dismax.  I see this as very unfortunate, as having
both complicates things and makes it harder to write them in books ;-)  I'd
love to simply say "dismax" without having to say "edismax" or wonder if
when someone said "dismax" they meant "edismax", etc.  Does anyone see this
changing / progressing?

~ David



-
Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834.html

Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



dismax vs edismax

2012-11-27 Thread David Smiley (@MITRE.org)
It was my hope that by now, the dismax & edismax distinction would be a thing
of the past, such that we'd simply call this by one name, simply "dismax". 
>From memories of various JIRA commentary, Jan wants this too and made great
progress enhancing edismax, but Hoss pushed back on edismax overtaking
dismax as "the" one new dismax.  I see this as very unfortunate, as having
both complicates things and makes it harder to write them in books ;-)  I'd
love to simply say "dismax" without having to say "edismax" or wonder if
when someone said "dismax" they meant "edismax", etc.  Does anyone see this
changing / progressing?

~ David



-
 Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book
--
View this message in context: 
http://lucene.472066.n3.nabble.com/dismax-vs-edismax-tp4022834.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

2012-11-27 Thread Radim Kolar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505203#comment-13505203
 ] 

Radim Kolar commented on SOLR-4041:
---

no

> Allow segment merge monitoring in Solr Admin gui
> 
>
> Key: SOLR-4041
> URL: https://issues.apache.org/jira/browse/SOLR-4041
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Radim Kolar
>Assignee: Mark Miller
>Priority: Minor
>  Labels: patch
> Fix For: 4.1, 5.0
>
> Attachments: solr-monitormerge.txt
>
>
> add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent

2012-11-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505201#comment-13505201
 ] 

Yonik Seeley commented on SOLR-2052:


bq. the fact that existing tests pass when you remove it doesn't in and of 
itself fill me with tons of confidence that it's definitely safe.

+1

I think some of the history is that stock Solr never used both at the same 
time, and because of that maybe not every place in the code worked with both, 
and because of that someone added the check/restriction.  I have no idea if all 
code paths would worth using both, but it's not something the current tests 
test for (since it's prohibited).  Changes to that should be backed up by new 
tests, and coverage analysis and/or a thorough code review.

> Allow for a list of filter queries and a single docset filter in 
> QueryComponent
> ---
>
> Key: SOLR-2052
> URL: https://issues.apache.org/jira/browse/SOLR-2052
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0-ALPHA
> Environment: Mac OS X, Java 1.6
>Reporter: Stephen Green
>Priority: Minor
> Fix For: 4.1
>
> Attachments: SOLR-2052-2.patch, SOLR-2052-3-6-1.patch, 
> SOLR-2052-3.patch, SOLR-2052-4_0_0.patch, SOLR-2052-4.patch, SOLR-2052.patch, 
> SOLR-2052-trunk.patch
>
>
> SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries 
> or a single filter (as a DocSet), but not both.  This restriction seems 
> arbitrary, and there are cases where we can have both a list of filter 
> queries and a DocSet generated by some other non-query process (e.g., 
> filtering documents according to IDs pulled from some other source like a 
> database.)
> Fixing this requires a few small changes to SolrIndexSearcher to allow both 
> of these to be set for a QueryCommand and to take both into account when 
> evaluating the query.  It also requires a modification to ResponseBuilder to 
> allow setting the single filter at query time.
> I've run into this against 1.4, but the same holds true for the trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



LUCENE-4569

2012-11-27 Thread John Wang
Hey guys:

   Any feedback on:

https://issues.apache.org/jira/browse/LUCENE-4569?

Thanks

-John


[jira] [Commented] (SOLR-4030) Use Lucene segment merge throttling

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505161#comment-13505161
 ] 

Mark Miller commented on SOLR-4030:
---

This would be nice to add - could you update this patch to 5x trunk Radmin?

> Use Lucene segment merge throttling
> ---
>
> Key: SOLR-4030
> URL: https://issues.apache.org/jira/browse/SOLR-4030
> Project: Solr
>  Issue Type: Improvement
>Reporter: Radim Kolar
>Assignee: Mark Miller
>Priority: Minor
>  Labels: patch
> Fix For: 4.1, 5.0
>
> Attachments: solr-mergeratelimit.txt
>
>
> add argument "maxMergeWriteMBPerSec" to Solr directory factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505155#comment-13505155
 ] 

Mark Miller commented on SOLR-4041:
---

Hmm...I'm not sure I'm the biggest fan of introducing a solrconcurrent merge 
class for this. Couldn't we just check for the type of merge scheduler, and on 
the right types add the right mbeans - and avoid introducing this new solr 
specific impl?

> Allow segment merge monitoring in Solr Admin gui
> 
>
> Key: SOLR-4041
> URL: https://issues.apache.org/jira/browse/SOLR-4041
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Radim Kolar
>Assignee: Mark Miller
>Priority: Minor
>  Labels: patch
> Fix For: 4.1, 5.0
>
> Attachments: solr-monitormerge.txt
>
>
> add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Mark Miller

On Nov 27, 2012, at 8:07 PM, Michael McCandless  
wrote:
>> There are almost like 40 committers. Most with day jobs, families, hobbies, 
>> etc. Many, many, many that are not paid to commit your patches.
> 
> 40 committers perhaps on paper but precious few are active and even
> fewer review & commit new patches. 

That is also the nature of open source. People get involved and over time move 
on all the time. Some duck back in here and there, others fade off into the 
night. You generally have many transients and an active smaller core. I think 
this is not at all uncommon.

Even then, getting contributors patches in is not normally going to be a high 
priority unless it happens to match a committers (or their employers) itch.

You have a lot of people donating a bit of time as they can in their busy lives 
- they will scratch their own issues and have little time for things not 
related to their itches, you have a few others that are paid to work on this 
stuff, but that generally involves specific company oriented goals - not just 
committing feature xyz because someone happened to put up a patch. And many of 
those guys have other duties beyond working on Lucene/Solr.

We should work hard on committing more patches the same way we should donate 
more money to charity - it's a nice sentiment, you can't fault it, but it's up 
against reality.

It's really just the nature of Open Source. We could have 100 committers and it 
probably wouldn't even come close to doubling the amount of 3rd party patches 
that get committed. It wont scale that way IMO. You will get a bunch more 
transients, a bunch more guys paid to work on specific things, and if you are 
lucky, perhaps a couple that donate their time for the general good of 
committing others patches.

- Mark


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4083) Deprecate specifying individual information in solr.xml. Possibly deprecate solr.xml entirely

2012-11-27 Thread Erick Erickson (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505135#comment-13505135
 ] 

Erick Erickson commented on SOLR-4083:
--

Looking for some advice here. The problem is that parsing the solr.xml file in 
CoreContainer.java is so intimately tied in to the fact that it's xml through 
the use of the Config class. It starts out simply in initialize, but then 
tendrils of XML-ese (XPath) is scattered all over the file.

So, I've thought of a few options.
1> I could be really slimy and, in initialize instead of using the canned 
solr.xml (DEF_SOLR_XML), I could _construct_ a new string that looked just like 
a really big solr.xml file from all the cores that are discovered and pass 
_that_ in just as things are done now. But I think I'd have to wash my hands 
afterwards. That doesn't get us any forwarder in terms of obsoleting solr.xml. 
It'd be fast though with minimal code disruption. But it'd build an XML file 
just to parse it into the DOM. Wasteful.

2> Abstract all of the current XPath/XML stuff specific in CoreContainer into a 
thunking layer that understood both ways of looking at the world. If it was 
initialized from a current solr.xml, it'd just pass all the stuff right through 
to the current Config. If it was populated by discovery, resolve the request 
"natively". So, for instance, a call like
 
cfg.getInt("solr/cores/@swappableCacheSize", Integer.MAX_VALUE)
in CoreContainer would be replaced by 
newcfg.getCoresInt("swappableCacheSize", Integer.MAX_VALUE)

Under the covers, this would resolve to something like
if (initialized from discovery) return 
newcfg.coresprops.getInt("swappableCacheSize", Integer.MAX_VALUE);
else return oldcfg.getInt("solr/cores/@swappableCacheSize", Integer.MAX_VALUE);

There'd be a newcfg.solr.getget### for things defined in the  tag 
etc.

3> Take my current notion of a pluggable CoreDescriptorProvider and just make 
it not pluggable. Populate it up front with the discovery process and go from 
there. This seems more consistent with the ZK CoreDescriptorProvider that's 
already there.

4> ?

I _think_ the whole question of whether config files live in a central 
directory and the rest of this discussion is orthogonal to this issue. There's 
really two questions here I guess.
a> is the trouble/complexity of a thunking layer worth the effort? It'll go a 
ways towards separating out the requirement of XML parsing for solr.xml at a 
cost of that complexity. Not sure it's a good call.
b> Is there another approach that I'm overlooking?

Of the three, I'm torn between the <2> and <3>, I can argue either way. <1> 
seems really hacky.

> Deprecate specifying individual  information in solr.xml. Possibly 
> deprecate solr.xml entirely
> 
>
> Key: SOLR-4083
> URL: https://issues.apache.org/jira/browse/SOLR-4083
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis
>Affects Versions: 4.1, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> Spinoff from SOLR-1306. Having a solr.xml file is limiting and possibly 
> unnecessary. We'd gain flexibility by having an "auto-discovery", essentially 
> walking the directories and finding all the cores and just loading them.
> Here's an issue to start the discussion of what that would look like. At this 
> point the way I'm thinking about it depends on SOLR-1306, which depends on 
> SOLR-1028, so the chain is getting kind of long.
> Straw-man proposal:
> 1> system properties can be specified as root paths in the solr tree to start 
> discovery.
> 2> the directory walking process will stop going deep (but not wide) in the 
> directories whenever a solrcore.properties file is encountered. That file can 
> contain any of the properties currently specifiable in a  tag. This 
> allows, for instance, re-use of a single solrconfig.xml or schema.xml file 
> across multiple cores. I really dont want to get into having 
> cores-within-cores. While this latter is possible, I don't see any advantage. 
> You _can_ have multiple roots and there's _no_ requirement that the cores be 
> in the directory immediately below that root they can be arbitrarily deep.
> 3> I'm not quite sure what to do with the various properties in the  
> tag. Perhaps just require these to be system properties?
> 4> Notice the title. Does it still make sense to specify <3> in solr.xml but 
> ignore the cores stuff? It seems like so little information will be in 
> solr.xml if we take all the  tags out that we should just kill it all 
> together.
> 5> Not quite sure what this means for _where_ the cores live. Is it 
> arbitrary? Anywyere on disk? Why not?
> 6> core swapping/renaming/whatever. Really, this is about how we m

[jira] [Commented] (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent

2012-11-27 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505132#comment-13505132
 ] 

Hoss Man commented on SOLR-2052:


Aaron: I don't really know/understand why the restriction is currently in 
place, but the fact that existing tests pass when you remove it doesn't in and 
of itself fill me with tons of confidence that it's definitely safe.

If you could add some tests to the patch demonstrating a codepath where both 
the Query filter and the DocSet filter get used, then i would be convinced 
(even just having a low level test that directly instantiated a 
SolrIndexSearcher.QueryCommand using both and executed demonstrating correct 
behavior would be re-assuring)

> Allow for a list of filter queries and a single docset filter in 
> QueryComponent
> ---
>
> Key: SOLR-2052
> URL: https://issues.apache.org/jira/browse/SOLR-2052
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0-ALPHA
> Environment: Mac OS X, Java 1.6
>Reporter: Stephen Green
>Priority: Minor
> Fix For: 4.1
>
> Attachments: SOLR-2052-2.patch, SOLR-2052-3-6-1.patch, 
> SOLR-2052-3.patch, SOLR-2052-4_0_0.patch, SOLR-2052-4.patch, SOLR-2052.patch, 
> SOLR-2052-trunk.patch
>
>
> SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries 
> or a single filter (as a DocSet), but not both.  This restriction seems 
> arbitrary, and there are cases where we can have both a list of filter 
> queries and a DocSet generated by some other non-query process (e.g., 
> filtering documents according to IDs pulled from some other source like a 
> database.)
> Fixing this requires a few small changes to SolrIndexSearcher to allow both 
> of these to be set for a QueryCommand and to take both into account when 
> evaluating the query.  It also requires a modification to ResponseBuilder to 
> allow setting the single filter at query time.
> I've run into this against 1.4, but the same holds true for the trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Michael McCandless
On Tue, Nov 27, 2012 at 7:55 PM, Mark Miller  wrote:
>
> On Nov 27, 2012, at 7:50 PM, Radim Kolar  wrote:
>
>> why you do not have more committers to process patches quickly?

We will forever not have enough committers, unfortunately.

That said, it's awful when patches are ignored, as yours were for a bit.

So I for one am happy you kept calling attention to them.  Thank you
for posting the patches, and for persisting!  And, sorry that this is
necessary in open-source ...

> There are almost like 40 committers. Most with day jobs, families, hobbies, 
> etc. Many, many, many that are not paid to commit your patches.

40 committers perhaps on paper but precious few are active and even
fewer review & commit new patches.  We all should strive to do it
more.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Lance Norskog
To be fair, it is worth having other people look at your patches. 
Not that anybody looks at mine :(

- Original Message -
| From: "Mark Miller" 
| To: dev@lucene.apache.org
| Sent: Tuesday, November 27, 2012 4:55:21 PM
| Subject: Re: Active 4.x branches?
| 
| 
| On Nov 27, 2012, at 7:50 PM, Radim Kolar  wrote:
| 
| > why you do not have more committers to process patches quickly?
| 
| There are almost like 40 committers. Most with day jobs, families,
| hobbies, etc. Many, many, many that are not paid to commit your
| patches.
| 
| It's a marathon, not a race. Welcome to Open Source :)
| 
| - Mark
| -
| To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
| For additional commands, e-mail: dev-h...@lucene.apache.org
| 
| 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

2012-11-27 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505129#comment-13505129
 ] 

Lance Norskog commented on SOLR-4041:
-

Cool! I have done monitoring of segment sizes with fixed-time polling, and 
post-commit polling of the data/index directory. This makes it easier to chart 
other aspects of merging. Another useful number is the current number of 
segments.

> Allow segment merge monitoring in Solr Admin gui
> 
>
> Key: SOLR-4041
> URL: https://issues.apache.org/jira/browse/SOLR-4041
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Radim Kolar
>Assignee: Mark Miller
>Priority: Minor
>  Labels: patch
> Fix For: 4.1, 5.0
>
> Attachments: solr-monitormerge.txt
>
>
> add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Mark Miller

On Nov 27, 2012, at 7:50 PM, Radim Kolar  wrote:

> why you do not have more committers to process patches quickly?

There are almost like 40 committers. Most with day jobs, families, hobbies, 
etc. Many, many, many that are not paid to commit your patches.

It's a marathon, not a race. Welcome to Open Source :)

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Radim Kolar

Dne 28.11.2012 1:25, Jack Krupansky napsal(a):
As the official guide book states, "Please be patient. Committers are 
busy people too. If no one responds to your patch after a few days, 
please make friendly reminders."

why you do not have more committers to process patches quickly?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Jack Krupansky
As the official guide book states, "Please be patient. Committers are busy 
people too. If no one responds to your patch after a few days, please make 
friendly reminders."


See:
http://wiki.apache.org/lucene-java/HowToContribute#Contributing_your_work

-- Jack Krupansky

-Original Message- 
From: Radim Kolar

Sent: Tuesday, November 27, 2012 11:27 AM
To: dev@lucene.apache.org
Subject: Re: Active 4.x branches?


Don't get hung up waiting for a committer to contribute, especially in 
documentation - in fact people new to the process are uniquely positioned 
to recognize gaps here - there is a lot you can do without commit rights.

I submitted 3 patches, you guys didn't do anything with it.

https://issues.apache.org/jira/browse/SOLR-4041
https://issues.apache.org/jira/browse/SOLR-4030
https://issues.apache.org/jira/browse/SOLR-4029



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4566) SearcherManager.afterRefresh() issues

2012-11-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505090#comment-13505090
 ] 

Michael McCandless commented on LUCENE-4566:


Why do we need afterClose in the listener?  It seems like the app can handle 
this itself?  I think for NRTManager we should just keep using the protected 
method ...

I think we don't need a protected method afterRefresh?  It should just be 
private, and it invokes the listeners?

Can we just use sync'd list for the listeners (eg like SegmentCoreReader's 
listeners)?

> SearcherManager.afterRefresh() issues
> -
>
> Key: LUCENE-4566
> URL: https://issues.apache.org/jira/browse/LUCENE-4566
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: selckin
>Priority: Minor
> Attachments: LUCENE-4566-double-listeners.patch, LUCENE-4566.patch
>
>
> 1) ReferenceManager.doMaybeRefresh seems to call afterRefresh even if it 
> didn't refresh/swap, (when newReference == null)
> 2) It would be nice if users were allowed to override 
> SearcherManager.afterRefresh() to get notified when a new searcher is in 
> action.
> But SearcherManager and ReaderManager are final, while NRTManager is not.
> The only way to currently hook into when a new searched is created is using 
> the factory, but if you wish to do some async task then, there are no 
> guarantees that acquire() will return the new searcher, so you have to pass 
> it around and incRef manually. While if allowed to hook into afterRefresh you 
> can just rely on acquire()  & existing infra you have around it to give you 
> the latest one.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505088#comment-13505088
 ] 

Commit Tag Bot commented on LUCENE-4576:


[branch_4x commit] Robert Muir
http://svn.apache.org/viewvc?view=revision&revision=1414478

LUCENE-4576: Remove CachingWrapperFilter recacheDeletes boolean



> Remove CachingWrapperFilter recacheDeletes boolean
> --
>
> Key: LUCENE-4576
> URL: https://issues.apache.org/jira/browse/LUCENE-4576
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4576.patch
>
>
> I think this option is bad news, its just a trap that causes caches to be 
> uselessly invalidated.
> If you really have a totally static index then just expunge your deletes.
> Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-4576.
-

   Resolution: Fixed
Fix Version/s: 5.0
   4.1

> Remove CachingWrapperFilter recacheDeletes boolean
> --
>
> Key: LUCENE-4576
> URL: https://issues.apache.org/jira/browse/LUCENE-4576
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
> Fix For: 4.1, 5.0
>
> Attachments: LUCENE-4576.patch
>
>
> I think this option is bad news, its just a trap that causes caches to be 
> uselessly invalidated.
> If you really have a totally static index then just expunge your deletes.
> Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505072#comment-13505072
 ] 

Commit Tag Bot commented on LUCENE-4576:


[trunk commit] Robert Muir
http://svn.apache.org/viewvc?view=revision&revision=1414474

LUCENE-4576: Remove CachingWrapperFilter recacheDeletes boolean



> Remove CachingWrapperFilter recacheDeletes boolean
> --
>
> Key: LUCENE-4576
> URL: https://issues.apache.org/jira/browse/LUCENE-4576
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
> Attachments: LUCENE-4576.patch
>
>
> I think this option is bad news, its just a trap that causes caches to be 
> uselessly invalidated.
> If you really have a totally static index then just expunge your deletes.
> Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2347) Use InputStream and not Reader for XML parsing

2012-11-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505056#comment-13505056
 ] 

Uwe Schindler commented on SOLR-2347:
-

It is not only XML files. In general, the encoding information of textual 
content should be determined by the parser. E.g. if you write a DIH instance 
reading from a network stream, the encoding might be defined by the headers 
(e.g. HTTP). In the case of XML it is defined by both headers and the data 
itsself ( header). Data import handler should in this case work with 
InputStreams, so the encoding could be determined later (e.g. when reading 
unknown text files, e.g. ICU4J could autodetect the encoding from language, 
etc.). This would also fit DIH better with TIKA processing.

My proposal is to let DIH take InputStreams and let the encoding be determined 
in a later stage of processing.

> Use InputStream and not Reader for XML parsing
> --
>
> Key: SOLR-2347
> URL: https://issues.apache.org/jira/browse/SOLR-2347
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.1
>
>
> Followup to SOLR-96:
> Solr mostly uses java.io.Reader and passes this Reader to the XML parser. 
> According to XML spec, a XML file should be initially seen as a binary stream 
> with a default charset of UTF-8 or another charset given by the network 
> protocol (like Content-Type header in HTTP). But very important, this default 
> charset is only a "hint" to the parser - mandatory is the charset from the 
> XML header processing inctruction. Because of this, the parser must be able 
> to change the charset when reading the XML headers (possibly also when seeing 
> BOM markers). This is not possible if the XML parser gets a java.io.Reader 
> instead of java.io.InputStreams. SOLR-96 already fixed this for the 
> XmlUpdateRequestHandler and the DocumentAnalysisRequestHandler. This issue 
> should fix the rest to be conforming to XML-spec (open schema.xml and 
> config.xml as InputStream not Reader and others).
> This change would not break anything in Solr (perhaps only backwards 
> compatibility in the API), as the default used by XML parsers is UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505054#comment-13505054
 ] 

Mark Miller commented on SOLR-139:
--

bq. about the lack of documentation

Since you are eating some of this pain, perhaps you could give a hand when you 
have it figured out and contribute to our wiki? http://wiki.apache.org/solr/

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505044#comment-13505044
 ] 

Robert Muir commented on LUCENE-4574:
-

Yes: thats the correct fix... already documented in this RelevanceComparator :)


> FunctionQuery ValueSource value computed twice per document
> ---
>
> Key: LUCENE-4574
> URL: https://issues.apache.org/jira/browse/LUCENE-4574
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0, 4.1
>Reporter: David Smiley
> Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
> at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-27 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505041#comment-13505041
 ] 

David Smiley commented on LUCENE-4574:
--

Ok, then SolrIndexSearcher.getDocListNC() should use TopDocsCollector when 
suitable.

p.s. my patch contained a small bug "scorer = ..." vs "this.scorer = ..." in 
FieldComparator.java line 970.

> FunctionQuery ValueSource value computed twice per document
> ---
>
> Key: LUCENE-4574
> URL: https://issues.apache.org/jira/browse/LUCENE-4574
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0, 4.1
>Reporter: David Smiley
> Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
> at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-27 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505040#comment-13505040
 ] 

Jack Krupansky commented on SOLR-139:
-

No apology necessary for the noise. I mean, none of us was able to offer a 
prompt response to earlier inquiries and this got me focused on actually trying 
the feature for the first time.

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4032) Files larger than an internal buffer size fail to replicate

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4032:
--

Priority: Blocker  (was: Major)

> Files larger than an internal buffer size fail to replicate
> ---
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-27 Thread Lukas Graf (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505035#comment-13505035
 ] 

Lukas Graf commented on SOLR-139:
-

Thanks for your response. I thought I had the issue reduced to a simple enough 
test case, but apparently not. I will try again with a clean stock Solr 4.0, 
and file a seperate issue if necessary, or look for support on the mailing 
list. My choice of words ('doesn't work as advertised') might have been 
influenced by frustration about the lack of documentation, sorry for the noise.

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505034#comment-13505034
 ] 

Robert Muir commented on LUCENE-4574:
-

I don't think we should do this.

If you are sorting by score, use TopDocsCollector!

> FunctionQuery ValueSource value computed twice per document
> ---
>
> Key: LUCENE-4574
> URL: https://issues.apache.org/jira/browse/LUCENE-4574
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0, 4.1
>Reporter: David Smiley
> Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
> at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document

2012-11-27 Thread David Smiley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-4574:
-

Attachment: LUCENE-4574.patch

I did some more debugging.  It appears the problem will only occur when 
RelevanceComparator is involved, as it is the only FieldComparator that 
overrides setScorer().  It wraps the scorer with the caching scorer, however 
ideally the passed-in scorer should already be wrapped as such (that is not the 
case).  Even if this comparator is the only one in Lucene that overrides this 
method, there's nothing stopping an app with a custom FieldComparator to do the 
same.  I think the right place to inject the cache is TopFieldCollector's 
subclasses of which there are two, each of which each override setScorer() to 
store a copy of the scorer onto a field.  Each should now look like:

{code:java}
@Override
public void setScorer(Scorer scorer) throws IOException {
  scorer = ScoreCachingWrappingScorer.wrap(scorer);
  this.scorer = scorer;
  comparator.setScorer(scorer);
}
{code}
(the .wrap() method is a convenience I added)

Sound good?  See the attached patch.

> FunctionQuery ValueSource value computed twice per document
> ---
>
> Key: LUCENE-4574
> URL: https://issues.apache.org/jira/browse/LUCENE-4574
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/search
>Affects Versions: 4.0, 4.1
>Reporter: David Smiley
> Attachments: LUCENE-4574.patch, Test_for_LUCENE-4574.patch
>
>
> I was working on a custom ValueSource and did some basic profiling and 
> debugging to see if it was being used optimally.  To my surprise, the value 
> was being fetched twice per document in a row.  This computation isn't 
> exactly cheap to calculate so this is a big problem.  I was able to 
> work-around this problem trivially on my end by caching the last value with 
> corresponding docid in my FunctionValues implementation.
> Here is an excerpt of the code path to the first execution:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> And here is the 2nd call:
> {noformat}
> at 
> org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48)
> at 
> org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153)
> at 
> org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56)
> at 
> org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951)
> at 
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312)
> at org.apache.lucene.search.Scorer.score(Scorer.java:62)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588)
> at 
> org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280)
> {noformat}
> The 2nd call appears to use some score caching mechanism, which is all well 
> and good, but that same mechanism wasn't used in the first call so there's no 
> cached value to retrieve.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4032) Files larger than an internal buffer size fail to replicate

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4032:
--

Affects Version/s: (was: 4.0)
   5.0
Fix Version/s: (was: 4.1)

> Files larger than an internal buffer size fail to replicate
> ---
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 5.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-4032) Files larger than an internal buffer size fail to replicate

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-4032.
---

Resolution: Fixed

Thanks Markus!

> Files larger than an internal buffer size fail to replicate
> ---
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505001#comment-13505001
 ] 

Yonik Seeley commented on SOLR-4114:


bq. > So that we don't lose functionality we currently have?
bq. So now you care about backwards compatibility?

I was speaking specifically about functionality, not back compatibility.

bq. With your current solution there will be no "waiting until that machine 
comes back up". You will just end up with 8 slices, where 7 of them have 2 
replica, and the last only have 1 replica.

Correct.  When I said "it's entirely reasonable for a user to want to wait", I 
meant wait to create the additional replica for one shard, not wait to create 
the whole collection.  Although I guess it might be useful to be able to fail 
collection creation if certain specified constraints aren't met (including a 
min replication factor).

As far as terminology, when I say replicationFactor of 3, I mean 3 copies of 
the data.  I also count the leader as a replica of a shard (which is logical).  
It follows from the clusterstate.json, which lists all "replicas" for a shard 
and one of them just has a flag indicating it's the leader.  This also makes it 
easier to talk about a shard having 0 replicas (meaning there is not even a 
leader).


> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4032) Files larger than an internal buffer size fail to replicate

2012-11-27 Thread Commit Tag Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13505000#comment-13505000
 ] 

Commit Tag Bot commented on SOLR-4032:
--

[trunk commit] Mark Robert Miller
http://svn.apache.org/viewvc?view=revision&revision=1414428

SOLR-4032: Files larger than an internal buffer size fail to replicate.



> Files larger than an internal buffer size fail to replicate
> ---
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4032) Files larger than an internal buffer size fail to replicate

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4032:
--

Summary: Files larger than an internal buffer size fail to replicate  (was: 
Unable to replicate between nodes ( read past EOF))

> Files larger than an internal buffer size fail to replicate
> ---
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4033) No lockType configured for NRTCachingDirectory

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504992#comment-13504992
 ] 

Mark Miller commented on SOLR-4033:
---

I'll double check some of this - and perhaps start using a lock - but I suspect 
this may be benign as long as it's not happening around index file access. I 
now write a few files through the Directory that don't need to worry about a 
lock factory (not index stuff).

> No lockType configured for NRTCachingDirectory
> --
>
> Key: SOLR-4033
> URL: https://issues.apache.org/jira/browse/SOLR-4033
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Mark Miller
Yeesh - contributor complain day.

I'll commit two of those.

- Mark

On Tue, Nov 27, 2012 at 11:27 AM, Radim Kolar  wrote:
>
>> Don't get hung up waiting for a committer to contribute, especially in
>> documentation - in fact people new to the process are uniquely positioned to
>> recognize gaps here - there is a lot you can do without commit rights.
>
> I submitted 3 patches, you guys didn't do anything with it.
>
> https://issues.apache.org/jira/browse/SOLR-4041
> https://issues.apache.org/jira/browse/SOLR-4030
> https://issues.apache.org/jira/browse/SOLR-4029
>
>
>
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>



-- 
- Mark

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4030) Use Lucene segment merge throttling

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4030:
--

 Priority: Minor  (was: Major)
Affects Version/s: (was: 5.0)
Fix Version/s: 5.0

> Use Lucene segment merge throttling
> ---
>
> Key: SOLR-4030
> URL: https://issues.apache.org/jira/browse/SOLR-4030
> Project: Solr
>  Issue Type: Improvement
>Reporter: Radim Kolar
>Assignee: Mark Miller
>Priority: Minor
>  Labels: patch
> Fix For: 4.1, 5.0
>
> Attachments: solr-mergeratelimit.txt
>
>
> add argument "maxMergeWriteMBPerSec" to Solr directory factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4030) Use Lucene segment merge throttling

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-4030:
-

Assignee: Mark Miller

> Use Lucene segment merge throttling
> ---
>
> Key: SOLR-4030
> URL: https://issues.apache.org/jira/browse/SOLR-4030
> Project: Solr
>  Issue Type: Improvement
>Reporter: Radim Kolar
>Assignee: Mark Miller
>  Labels: patch
> Fix For: 4.1, 5.0
>
> Attachments: solr-mergeratelimit.txt
>
>
> add argument "maxMergeWriteMBPerSec" to Solr directory factories.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2347) Use InputStream and not Reader for XML parsing

2012-11-27 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504989#comment-13504989
 ] 

James Dyer commented on SOLR-2347:
--

Uwe,  Does your concern here entirely have to do with when DIH is indexing XML 
files?

> Use InputStream and not Reader for XML parsing
> --
>
> Key: SOLR-2347
> URL: https://issues.apache.org/jira/browse/SOLR-2347
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
> Fix For: 4.1
>
>
> Followup to SOLR-96:
> Solr mostly uses java.io.Reader and passes this Reader to the XML parser. 
> According to XML spec, a XML file should be initially seen as a binary stream 
> with a default charset of UTF-8 or another charset given by the network 
> protocol (like Content-Type header in HTTP). But very important, this default 
> charset is only a "hint" to the parser - mandatory is the charset from the 
> XML header processing inctruction. Because of this, the parser must be able 
> to change the charset when reading the XML headers (possibly also when seeing 
> BOM markers). This is not possible if the XML parser gets a java.io.Reader 
> instead of java.io.InputStreams. SOLR-96 already fixed this for the 
> XmlUpdateRequestHandler and the DocumentAnalysisRequestHandler. This issue 
> should fix the rest to be conforming to XML-spec (open schema.xml and 
> config.xml as InputStream not Reader and others).
> This change would not break anything in Solr (perhaps only backwards 
> compatibility in the API), as the default used by XML parsers is UTF-8.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4032) Unable to replicate between nodes ( read past EOF)

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504987#comment-13504987
 ] 

Mark Miller commented on SOLR-4032:
---

It's just silly bugs because the tests never replicated a file larger than the 
buffer size. I've added a @Nightly test that does.

> Unable to replicate between nodes ( read past EOF)
> --
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4041) Allow segment merge monitoring in Solr Admin gui

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4041:
--

 Priority: Minor  (was: Major)
Affects Version/s: (was: 5.0)
Fix Version/s: 5.0
 Assignee: Mark Miller

> Allow segment merge monitoring in Solr Admin gui
> 
>
> Key: SOLR-4041
> URL: https://issues.apache.org/jira/browse/SOLR-4041
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Reporter: Radim Kolar
>Assignee: Mark Miller
>Priority: Minor
>  Labels: patch
> Fix For: 4.1, 5.0
>
> Attachments: solr-monitormerge.txt
>
>
> add solrMbean for ConcurrentMergeScheduler

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-4033) No lockType configured for NRTCachingDirectory

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-4033:
-

Assignee: Mark Miller

> No lockType configured for NRTCachingDirectory
> --
>
> Key: SOLR-4033
> URL: https://issues.apache.org/jira/browse/SOLR-4033
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4032) Unable to replicate between nodes ( read past EOF)

2012-11-27 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-4032:
--

Attachment: SOLR-4032.patch

> Unable to replicate between nodes ( read past EOF)
> --
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504973#comment-13504973
 ] 

Mark Miller commented on SOLR-4114:
---

Solr 3.X to Solr 4.X back compat is not considered the same as Solr 4.0 to Solr 
4.1 back compat.

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4032) Unable to replicate between nodes ( read past EOF)

2012-11-27 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504961#comment-13504961
 ] 

Markus Jelsma commented on SOLR-4032:
-

Might be related to SOLR-4033 regarding NRTDir.

> Unable to replicate between nodes ( read past EOF)
> --
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504954#comment-13504954
 ] 

Per Steffensen commented on SOLR-4114:
--

bq. So that we don't lose functionality we currently have?

So now you care about backwards compatibility? :-) You didnt care about 
backwards compatibility from 3.6 to 4.0 when you introduced optimistic locking 
(including error in case of updating an existing document without providing 
correct version), which is forced upon you in 4.0 if you choose to run with 
version-field and update-log. There are perfectly valid reasons for wanting to 
use version-field and update-log, without wanting to have fullblown optimistic 
locking. My solution to SOLR-3178 support this kind of backwards compatibility 
by letting you explicitly choose among update-semantics modes "classic", 
"consistency" and "classic-consistency-hybrid". So if you come from 3.6 and 
want backwards compatibile update-semantics, but also want version-field and 
update-log, you just choose update-semantics "classic" :-) See 
http://wiki.apache.org/solr/Per%20Steffensen/Update%20semantics.
Im just teasing you a little :-)

But anyway, I like backwards compatibility so you are right, we probably do not 
want to do something that change default behaviour in 4.0.0. Will have a look 
at a solution tomorrow. It is kinda late in europe now.

bq. Example: you have 24 servers and create a collection with 8 shards and a 
target replication factor of 3... but one of the servers goes down in the 
meantime so one shard has only 2 replicas. It's entirely reasonable for a user 
to want to wait until that machine comes back up rather than doubling up on a 
different node.

Assume you mean replication-factor of 2? With a replication-factor of 2 you 
will get 3 shards per slice.

With your current solution there will be no "waiting until that machine comes 
back up". You will just end up with 8 slices, where 7 of them have 2 replica, 
and the last only have 1 replica. With the patch I provided today you will end 
up with 8 slices, where all of them have 2 replica - but one of the servers 
will be running two shards and the solr down will not be running any (when it 
comes back up). I probably would prefer my current solution - at least you 
acheive the property that any two servers can crash (including disk crash) 
without you loosing data - which is basically what you want to acheive when you 
request replication-factor of 2.

But waiting for the machine to come back up before creating the collection 
would certainly be the best solution. It is just extremly hard to know if a 
machine is down or not - or if you intented to run one server more than what is 
currently running. In general there is no information in solr/ZK about that - 
and there shouldnt. In this case a maxShardsPerNode could be a nice way to tell 
the system that you just want to wait. But then it would have to be implemented 
correctly, and that is really hard. In OverseerCollectionProcessor you can 
check if you can meet the maxShardsPerNode requirement with the current set of 
live solrs, and if you cant just dont initiate the creation process. But a 
server can go down between the time where the OverseerCollectionProcessor 
checks and the time where it is supposed to create a shard. Therefore it is 
impossible to guarantee that the OverseerCollectionProcessor does not create 
some shards of a new collection without being able to create them all while 
still living up to the maxShardsPerNode requirement. In such case, if you 
really want to live up to the maxShardsPerNode requiremnt the 
OverseerCollectionProcessor would have to try to delete the shards of the 
collection that was successfully created. But this deletion process can also 
fail. Ahhh there is no guaranteeed way.

Therefore my idea about the whole thing, is more aming at just having all the 
shards created, and then move them around later. I know this is not possible 
for now, but I do expect that we (at least my project) will make support for 
(manually and/or automatic) migration of shards from one server to another. 
This feature is needed  to acheive nice elasiticty (moving shards/load onto new 
servers as they join the cluster), but also to do re-balancing after e.g. a 
solr was down (and a shard that should have been placed on this server was 
temporarily created to run on another server).

Well as I said I will consider the best (small patch :-) ) solution tomorrow. 
But if I cant come up with a better small-patch-solution we can certainly do 
the maxShardsPerNode thing - no problemo. It just isnt going to be 100% 
guaranteed.


> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: http

Re: Active 4.x branches?

2012-11-27 Thread Radim Kolar



Don't get hung up waiting for a committer to contribute, especially in 
documentation - in fact people new to the process are uniquely positioned to 
recognize gaps here - there is a lot you can do without commit rights.

I submitted 3 patches, you guys didn't do anything with it.

https://issues.apache.org/jira/browse/SOLR-4041
https://issues.apache.org/jira/browse/SOLR-4030
https://issues.apache.org/jira/browse/SOLR-4029



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-1306) Support pluggable persistence/loading of solr.xml details

2012-11-27 Thread Erick Erickson (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-1306.
--

Resolution: Won't Fix

See the discussion at SOLR-4083. Rather than a pluggable core descriptor 
provider, we'll walk the cores directory under with certain rules and discover 
all the cores present. Simpler that way since the cores need to be physically 
present anyway in order to be referenced from a pluggable architecture.

> Support pluggable persistence/loading of solr.xml details
> -
>
> Key: SOLR-1306
> URL: https://issues.apache.org/jira/browse/SOLR-1306
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore
>Reporter: Noble Paul
>Assignee: Erick Erickson
> Fix For: 4.1
>
> Attachments: SOLR-1306.patch, SOLR-1306.patch, SOLR-1306.patch, 
> SOLR-1306.patch
>
>
> Persisting and loading details from one xml is fine if the no:of cores are 
> small and the no:of cores are few/fixed . If there are 10's of thousands of 
> cores in a single box adding a new core (with persistent=true) becomes very 
> expensive because every core creation has to write this huge xml. 
> Moreover , there is a good chance that the file gets corrupted and all the 
> cores become unusable . In that case I would prefer it to be stored in a 
> centralized DB which is backed up/replicated and all the information is 
> available in a centralized location. 
> We may need to refactor CoreContainer to have a pluggable implementation 
> which can load/persist the details . The default implementation should 
> write/read from/to solr.xml . And the class should be pluggable as follows in 
> solr.xml
> {code:xml}
> 
>   
> 
> {code}
> There will be a new interface (or abstract class ) called SolrDataProvider 
> which this class must implement

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504889#comment-13504889
 ] 

Yonik Seeley commented on SOLR-4114:


bq. Well I see no reason to introduce (in the first step at least) a 
maxShardsPerNode. 

So that we don't lose functionality we currently have?
And I agree that it should be up to the user, hence the proposed parameter to 
control it.

bq. Only potential problem is if his create request is run when not all Solr 
servers are running, and in such case a maxShardsPerNode could help to stop the 
creation process.

Exactly... there's the main use case.

Example: you have 24 servers and create a collection with 8 shards and a target 
replication factor of 3... but one of the servers goes down in the meantime so 
one shard has only 2 replicas.  It's entirely reasonable for a user to want to 
wait until that machine comes back up rather than doubling up on a different 
node.

The other use case is the examples on http://wiki.apache.org/solr/SolrCloud

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [jira] [Commented] (SOLR-4032) Unable to replicate between nodes ( read past EOF)

2012-11-27 Thread eksdev
thanks  Mark! 


On Nov 27, 2012, at 8:43 PM, Mark Miller (JIRA)  wrote:

> 
>[ 
> https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504875#comment-13504875
>  ] 
> 
> Mark Miller commented on SOLR-4032:
> ---
> 
> I'll try and make a fix soon.
> 
>> Unable to replicate between nodes ( read past EOF)
>> --
>> 
>>Key: SOLR-4032
>>URL: https://issues.apache.org/jira/browse/SOLR-4032
>>Project: Solr
>> Issue Type: Bug
>> Components: SolrCloud
>>   Affects Versions: 4.0
>>Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
>> 12:37:38
>> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>>   Reporter: Markus Jelsma
>>   Assignee: Mark Miller
>>Fix For: 4.1, 5.0
>> 
>> 
>> Please see: 
>> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>>  and 
>> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html
> 
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504881#comment-13504881
 ] 

Per Steffensen commented on SOLR-4114:
--

Learned from Steve today, that you usually develop for 5.x on trunk, and then 
back port to 4.x.y branches. Let me know if you would like a trunk-based patch 
instead

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4032) Unable to replicate between nodes ( read past EOF)

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504875#comment-13504875
 ] 

Mark Miller commented on SOLR-4032:
---

I'll try and make a fix soon.

> Unable to replicate between nodes ( read past EOF)
> --
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4032) Unable to replicate between nodes ( read past EOF)

2012-11-27 Thread Eks Dev (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504859#comment-13504859
 ] 

Eks Dev commented on SOLR-4032:
---

We see it as well, 

it looks like it only happens with NRTCachingDirectory, but take this statement 
with healthy  suspicion. It went ok only once without NRTCachingDirectory. 




> Unable to replicate between nodes ( read past EOF)
> --
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit

2012-11-27 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504854#comment-13504854
 ] 

Robert Muir commented on LUCENE-4236:
-

There's a lot of things i'm not happy with in the patch, i think it was more of 
an exploration of ideas.

I think we could split out the cost/hitcount/conjunctionscorer idea into a 
separate issue as a start?

This would keep things contained.

> clean up booleanquery conjunction optimizations a bit
> -
>
> Key: LUCENE-4236
> URL: https://issues.apache.org/jira/browse/LUCENE-4236
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.1
>
> Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
> LUCENE-4236.patch
>
>
> After LUCENE-3505, I want to do a slight cleanup:
> * compute the term conjunctions optimization in scorer(), so its applied even 
> if we have optional and prohibited clauses that dont exist in the segment 
> (e.g. return null)
> * use the term conjunctions optimization when optional.size() == 
> minShouldMatch, as that means they are all mandatory, too.
> * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
> it means we have required clauses and in general BS2 should do a much better 
> job (e.g. use advance).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit

2012-11-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504827#comment-13504827
 ] 

Michael McCandless commented on LUCENE-4236:


Maybe we should make it .estimateHitCount instead of estimateCost, so it's more 
explicit?

> clean up booleanquery conjunction optimizations a bit
> -
>
> Key: LUCENE-4236
> URL: https://issues.apache.org/jira/browse/LUCENE-4236
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.1
>
> Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
> LUCENE-4236.patch
>
>
> After LUCENE-3505, I want to do a slight cleanup:
> * compute the term conjunctions optimization in scorer(), so its applied even 
> if we have optional and prohibited clauses that dont exist in the segment 
> (e.g. return null)
> * use the term conjunctions optimization when optional.size() == 
> minShouldMatch, as that means they are all mandatory, too.
> * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
> it means we have required clauses and in general BS2 should do a much better 
> job (e.g. use advance).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4236) clean up booleanquery conjunction optimizations a bit

2012-11-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504825#comment-13504825
 ] 

Michael McCandless commented on LUCENE-4236:


+1

This patch looks great!

It cleans up BS2 and specialized term conjunction scorer, and makes more 
accurate decisions about which sub-scorer to enumerate first (no more first 
docID heuristic).

We could also use the cost estimate to sometimes let BooleanScorer take MUST 
clauses.

> clean up booleanquery conjunction optimizations a bit
> -
>
> Key: LUCENE-4236
> URL: https://issues.apache.org/jira/browse/LUCENE-4236
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Robert Muir
> Fix For: 4.1
>
> Attachments: LUCENE-4236.patch, LUCENE-4236.patch, LUCENE-4236.patch, 
> LUCENE-4236.patch
>
>
> After LUCENE-3505, I want to do a slight cleanup:
> * compute the term conjunctions optimization in scorer(), so its applied even 
> if we have optional and prohibited clauses that dont exist in the segment 
> (e.g. return null)
> * use the term conjunctions optimization when optional.size() == 
> minShouldMatch, as that means they are all mandatory, too.
> * don't return booleanscorer1 when optional.size() == minShouldMatch, because 
> it means we have required clauses and in general BS2 should do a much better 
> job (e.g. use advance).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: composition of different queries based scores

2012-11-27 Thread Jack Krupansky
The fuzzy option will be ignored here – you cannot combine fuzzy and wild on 
the same term, although you could do an OR of the two:

(hello* OR hello~)

-- Jack Krupansky

From: sri krishna 
Sent: Tuesday, November 27, 2012 11:08 AM
To: dev@lucene.apache.org 
Subject: composition of different queries based scores

for a search string hello*~ how the scoring is calculated?

as the formula given in the url:  
http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/search/Similarity.html,
 doesnot take into consideration of edit distance and prefix term corresponding 
factors into account.

Does lucene add up the scores obtained from each type of query included i.e for 
the above query actual score=default scoring+1/(edit distance)+prefix match 
score ?, If so, there is no normalization between scores, else what is the 
approach lucene follows starting from seperating each query based identifiers 
like (~(edit distance), *(prefix query) etc) to actual scoring. 






[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.8.0-ea-b58) - Build # 2877 - Still Failing!

2012-11-27 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/2877/
Java: 32bit/jdk1.8.0-ea-b58 -server -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 4712 lines...]
[junit4:junit4] ERROR: JVM J1 ended with an exception, command line: 
/mnt/ssd/jenkins/tools/java/32bit/jdk1.8.0-ea-b58/jre/bin/java -server 
-XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/heapdumps 
-Dtests.prefix=tests -Dtests.seed=748F06EFFB990338 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false 
-Dtests.lockdir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build 
-Dtests.codec=random -Dtests.postingsformat=random -Dtests.locale=random 
-Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=4.1 
-Dtests.cleanthreads=perMethod 
-Djava.util.logging.config.file=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=3 -DtempDir=. 
-Djava.io.tmpdir=. 
-Dtests.sandbox.dir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/analysis/common
 
-Dclover.db.dir=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/junit4/tests.policy
 -Dlucene.version=4.1-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dfile.encoding=ISO-8859-1 -classpath 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/analysis/common/classes/test:/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/test-framework/classes/java:/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/codecs/classes/java:/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/core/classes/java:/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/test-framework/lib/junit-4.10.jar:/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/test-framework/lib/randomizedtesting-runner-2.0.4.jar:/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/analysis/common/classes/java:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-launcher.jar:/var/lib/jenkins/.ant/lib/ivy-2.2.0.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-jdepend.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-netrexx.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-antlr.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-commons-net.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-javamail.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-apache-regexp.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-jsch.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-apache-xalan2.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-junit4.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-jmf.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-junit.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-apache-bcel.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-jai.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-commons-logging.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-apache-resolver.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-apache-oro.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-swing.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-apache-bsf.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-apache-log4j.jar:/var/lib/jenkins/tools/Ant/ANT_1.8.2/lib/ant-testutil.jar:/mnt/ssd/jenkins/tools/java/32bit/jdk1.8.0-ea-b58/lib/tools.jar:/var/lib/jenkins/.ivy2/cache/com.carrotsearch.randomizedtesting/junit4-ant/jars/junit4-ant-2.0.4.jar
 -ea:org.apache.lucene... -ea:org.apache.solr... 
com.carrotsearch.ant.tasks.junit4.slave.SlaveMainSafe -eventsfile 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/analysis/common/test/junit4-J1-20121127_171044_810.events
 
@/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/build/analysis/common/test/junit4-J1-20121127_171044_810.suites
[junit4:junit4] ERROR: JVM J1 ended with an exception: Forked process returned 
with error code: 134 Very likely a JVM crash.  Process output piped in logs 
above.
[junit4:junit4] at 
com.carrotsearch.ant.tasks.junit4.JUnit4.executeSlave(JUnit4.java:1206)
[junit4:junit4] at 
com.carrotsearch.ant.tasks.junit4.JUnit4.access$000(JUnit4.java:65)
[junit4:junit4] at 
com.carrotsearch.ant.tasks.junit4.JUnit4$2.call(JUnit4.java:813)
[junit4:junit4] at 
com.carrotsearch.ant.tasks.junit4.JUnit4$2.call(JUnit4.java:810)
[junit4:junit4] at 
java.util.concurrent.FutureTask.run(FutureTask.java:262)
[junit4:junit4] at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
[junit4:junit4] at 
java.util.concurrent.Thre

[jira] [Commented] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504768#comment-13504768
 ] 

Michael McCandless commented on LUCENE-4576:


I ran a quick perf test (searching on 'the' on 10M wikipedia index), with 
periodic reopen, and the perf gain is negligible (< 1%) so +1 to nuke this!  I 
agree it's trappy...

> Remove CachingWrapperFilter recacheDeletes boolean
> --
>
> Key: LUCENE-4576
> URL: https://issues.apache.org/jira/browse/LUCENE-4576
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
> Attachments: LUCENE-4576.patch
>
>
> I think this option is bad news, its just a trap that causes caches to be 
> uselessly invalidated.
> If you really have a totally static index then just expunge your deletes.
> Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-Tests-trunk-Java6 - Build # 15600 - Failure

2012-11-27 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java6/15600/

All tests passed

Build Log:
[...truncated 7336 lines...]
BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java6/build.xml:90:
 The following files contain @author tags, tabs or nocommits:
* 
solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestVariableResolverEndToEnd.java

Total time: 18 minutes 18 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-4047) dataimporter.functions.encodeUrl throughs Unable to encode expression: field.name with value: null

2012-11-27 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer reassigned SOLR-4047:


Assignee: James Dyer

> dataimporter.functions.encodeUrl throughs Unable to encode expression: 
> field.name with value: null
> --
>
> Key: SOLR-4047
> URL: https://issues.apache.org/jira/browse/SOLR-4047
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
> Environment: Windows 7
>Reporter: Igor Dobritskiy
>Assignee: James Dyer
>Priority: Critical
> Attachments: db-data-config.xml, db.sql, schema.xml, solrconfig.xml
>
>
> For some reason dataimporter.functions.encoude URL stopped work after update 
> to solr 4.0 from 3.5.
> Here is the error
> {code}
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
> encode expression: attach.name with value: null Processing Document # 1
> {code}
> Here is the data import config snippet:
> {code}
> ...
>  query="select name from accounts where account_id = 
> '${attach.account_id}'">
>  dataSource="bin"
> format="text" 
> 
> url="http://example.com/data/${account.name}/attaches/${attach.item_id}/${dataimporter.functions.encodeUrl(attach.name)}">
> 
>  
> 
> ...
> {code}
> When I'm changing it to *not* use dataimporter.functions.encodeUrl it works 
> but I need to url encode file names as they have special chars in theirs 
> names.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.6.0_37) - Build # 2886 - Failure!

2012-11-27 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/2886/
Java: 64bit/jdk1.6.0_37 -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 7319 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:90: The following 
files contain @author tags, tabs or nocommits:
* 
solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestVariableResolverEndToEnd.java

Total time: 14 minutes 16 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.6.0_37 -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4047) dataimporter.functions.encodeUrl throughs Unable to encode expression: field.name with value: null

2012-11-27 Thread James Dyer (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504762#comment-13504762
 ] 

James Dyer commented on SOLR-4047:
--

Igor,  I just committed a fix for SOLR-2141 & SOLR-3842 that also includes a 
test that demonstrates this issue also.  However, this test passes and I'm not 
sure anything is actually broken, at least not on the latest revision in Trunk 
or Branch_4x.  Note though this test does not use Tika. However, the code for 
resolving the Tike URL is similar to the code for other Entity processors and 
it should work the same.

See TestVariableResolverEndToEnd, which generates a data-config.xml like this:

{code}
 
 
 

   
  




   
   
  

 
 
{code}

As you can see the Sql Query on the child entity, instead of having "select", 
it uses ${dataimporter.functions.encodeUrl(FIRST.SELECT_KEYWORD)}, getting the 
word "select" from the data in the parent entity.

The response shows it is correctly executing the inner entity:
{code}
  "response":{"numFound":1,"start":0,"docs":[
  {
"select_keyword_s":"SELECT",
"id":"1",
"second3_s":"2012",
"second2_s":"2012",
"PORK_s":"GRILL",
"BEEF_CUTS_mult_s":["ROUND",
  "SIRLOIN"],
"second1_s":"2012",
"FISH_s":"FRY",
"timestamp":"2012-11-27T16:55:39.409Z"}]
  }
{code}

Unless someone can demonstrate this is an actual problem (once again, a good 
failing unit test would help a lot), I will close this as "not a problem" in 
the next week or so.

> dataimporter.functions.encodeUrl throughs Unable to encode expression: 
> field.name with value: null
> --
>
> Key: SOLR-4047
> URL: https://issues.apache.org/jira/browse/SOLR-4047
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
> Environment: Windows 7
>Reporter: Igor Dobritskiy
>Priority: Critical
> Attachments: db-data-config.xml, db.sql, schema.xml, solrconfig.xml
>
>
> For some reason dataimporter.functions.encoude URL stopped work after update 
> to solr 4.0 from 3.5.
> Here is the error
> {code}
> Full Import failed:java.lang.RuntimeException: java.lang.RuntimeException: 
> org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
> encode expression: attach.name with value: null Processing Document # 1
> {code}
> Here is the data import config snippet:
> {code}
> ...
>  query="select name from accounts where account_id = 
> '${attach.account_id}'">
>  dataSource="bin"
> format="text" 
> 
> url="http://example.com/data/${account.name}/attaches/${attach.item_id}/${dataimporter.functions.encodeUrl(attach.name)}">
> 
>  
> 
> ...
> {code}
> When I'm changing it to *not* use dataimporter.functions.encodeUrl it works 
> but I need to url encode file names as they have special chars in theirs 
> names.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-3842) DataImportHandler: When attempting to use column values as field names, multivalued fields only retain the first result

2012-11-27 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-3842.
--

   Resolution: Fixed
Fix Version/s: 5.0
   4.1
 Assignee: James Dyer

fixed with SOLR-2141.

> DataImportHandler: When attempting to use column values as field names, 
> multivalued fields only retain the first result
> ---
>
> Key: SOLR-3842
> URL: https://issues.apache.org/jira/browse/SOLR-3842
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0-BETA
>Reporter: Eric Kingston
>Assignee: James Dyer
> Fix For: 4.1, 5.0
>
>
> Can you please verify if this issue is simply due to improper implementation, 
> or if it's a bug in Solr?
> http://stackoverflow.com/questions/12412040/solr-dataimporthandler-when-attempting-to-use-column-values-as-field-names-m
> Also, I should note that a static reference to fieldB does handle multiple 
> values correctly.
> Example: 
> It only fails when I try to set the field names dynamically.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-trunk-Windows (64bit/jdk1.7.0_09) - Build # 1929 - Failure!

2012-11-27 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Windows/1929/
Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC

All tests passed

Build Log:
[...truncated 7961 lines...]
BUILD FAILED
C:\Users\JenkinsSlave\workspace\Lucene-Solr-trunk-Windows\build.xml:90: The 
following files contain @author tags, tabs or nocommits:
* 
solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestVariableResolverEndToEnd.java

Total time: 23 minutes 56 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4032) Unable to replicate between nodes ( read past EOF)

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504751#comment-13504751
 ] 

Mark Miller commented on SOLR-4032:
---

Okay, I can reproduce this in a test now - the test just has to index many docs 
- I tried 1 million just to be sure and I see issues - I think I was trying in 
the 10's of thousands before.

> Unable to replicate between nodes ( read past EOF)
> --
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (SOLR-2141) NullPointerException when using escapeSql function

2012-11-27 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-2141.
--

Resolution: Fixed

committed.

Trunk: r1414242
4x: r1414250

> NullPointerException when using escapeSql function
> --
>
> Key: SOLR-2141
> URL: https://issues.apache.org/jira/browse/SOLR-2141
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4.1, 4.0
> Environment: openjdk 1.6.0 b12
>Reporter: Edward Rudd
>Assignee: James Dyer
> Fix For: 4.1, 5.0
>
> Attachments: dih-config.xml, dih-file.xml, SOLR-2141.b341f5b.patch, 
> SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, 
> SOLR-2141.patch, SOLR-2141.patch, SOLR-2141-sample.patch, SOLR-2141-test.patch
>
>
> I have two entities defined, nested in each other..
> 
>  
>
> 
> Now, when I run that it bombs on any article where subcategory = '' (it's a 
> NOT NULL column so empty string is there)  If i do where subcategory!='' in 
> the article query it works fine (aside from not pulling in all of the 
> articles).
> org.apache.solr.handler.dataimport.DataImportHandlerException: 
> java.lang.NullPointerException
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
> at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
> at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:75)
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:216)
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:204)
> at 
> org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:107)
> at 
> org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81)
> at 
> org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75)
> at 
> org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:87)
> at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
> ... 6 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-4.x-Linux (32bit/jdk1.7.0_09) - Build # 2876 - Failure!

2012-11-27 Thread Policeman Jenkins Server
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-4.x-Linux/2876/
Java: 32bit/jdk1.7.0_09 -server -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 7848 lines...]
BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:90: The following 
files contain @author tags, tabs or nocommits:
* 
solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestVariableResolverEndToEnd.java

Total time: 16 minutes 12 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Description set: Java: 32bit/jdk1.7.0_09 -server -XX:+UseSerialGC
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Fullmetal Jenkins: Solr4X - Build # 255 - Failure!

2012-11-27 Thread Mark Miller
/ext3space/jenkins/workspace/Solr4X/build.xml:90: The following files contain 
@author tags, tabs or nocommits:
* 
solr/contrib/dataimporthandler/src/test/org/apache/solr/handler/dataimport/TestVariableResolverEndToEnd.java

Total time: 3 seconds

On Nov 27, 2012, at 11:30 AM, nore...@fullmetaljenkins.org wrote:

> Solr4X - Build # 255 - Failure:
> 
> Check console output at http://fullmetaljenkins.org/job/Solr4X/255/ to view 
> the results.
> 
> No tests ran.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2052) Allow for a list of filter queries and a single docset filter in QueryComponent

2012-11-27 Thread Aaron Daubman (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504738#comment-13504738
 ] 

Aaron Daubman commented on SOLR-2052:
-

I am successfully using patch SOLR-2052-4_0_0 against 
https://github.com/apache/lucene-solr/tree/lucene_solr_4_0

ant clean test
reports:
BUILD SUCCESSFUL
Total time: 10 minutes 30 seconds

What else can I do to help ensure this patch makes it into 4.1 (or the next 
bump to lucene_solr_4_0)?

> Allow for a list of filter queries and a single docset filter in 
> QueryComponent
> ---
>
> Key: SOLR-2052
> URL: https://issues.apache.org/jira/browse/SOLR-2052
> Project: Solr
>  Issue Type: Improvement
>  Components: search
>Affects Versions: 4.0-ALPHA
> Environment: Mac OS X, Java 1.6
>Reporter: Stephen Green
>Priority: Minor
> Fix For: 4.1
>
> Attachments: SOLR-2052-2.patch, SOLR-2052-3-6-1.patch, 
> SOLR-2052-3.patch, SOLR-2052-4_0_0.patch, SOLR-2052-4.patch, SOLR-2052.patch, 
> SOLR-2052-trunk.patch
>
>
> SolrIndexSearcher.QueryCommand allows you to specify a list of filter queries 
> or a single filter (as a DocSet), but not both.  This restriction seems 
> arbitrary, and there are cases where we can have both a list of filter 
> queries and a DocSet generated by some other non-query process (e.g., 
> filtering documents according to IDs pulled from some other source like a 
> database.)
> Fixing this requires a few small changes to SolrIndexSearcher to allow both 
> of these to be set for a QueryCommand and to take both into account when 
> evaluating the query.  It also requires a modification to ResponseBuilder to 
> allow setting the single filter at query time.
> I've run into this against 1.4, but the same holds true for the trunk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Attachment: SOLR-4114.patch

About SOLR-4114.patch:
* It fits on top of revision 1412602 of branch lucene_solr_4_0
* The shard allocation algorithm explained
** Shards are allocated to Solr servers one by one. The next shard is always 
assigned to the next server in a shuffled list of live servers. Whenever you 
reach the end of the list of live servers you start over again.
** Replica for a certain shard are allocated to the #replication-factor next 
servers in the list
** replication-factor is reduced if it is requested to be higher than "the 
number of live servers - 1". Kinda pointless to run two shards belonging to the 
same slice on the same server
*** Unfortunately only able to log the decission about such a 
replication-factor reduction - no easy way to get info back to caller since the 
job is handled asynchronously by the Overseer
* Besides that a bug-fix included
** OverseerCollectionProcessor.createCollection and .collectionCmd reused 
params-objects too much. The same params-object was used for several submits to 
ShardHandler, but since the ShardsHandler issues asynchronous jobs, the 
params-object might be changed by the OverseerCollectionProcessor before the 
asynchronous job is executed - resulting in a lot of fun :-) Comments added 
around the fixes
*** This bug does not appear to be fixed on lucene_solr_4_0
*** It appears to be partly fixed on branch_4x - fixed in collectionCmd (used 
for delete and reload) but not in createCollection (used for create)
* Besides that a little cleaning up - I know you don't like it, but my eyes 
cannot handle such mess :-)
** BasicDistributedZkTest: Introduced method getCommonCloudSolrServer to be 
used instead of just using solrj. The solrj variable was initialized in method 
queryServer but used lots of other places. For this to work your test needs to 
call queryServer before any of the other methods using solrj. This is fragile, 
when you change the test, and if you (as I did) commented out parts of the test.
** HttpShardHandler: Made getURLs thread-safe so that you do not have to be so 
careful using it
** General: Took a small step towards consistent usage of terms collection, 
node-name, node-base-url, slice, shard and replica. All over the code the terms 
are mixed up, I took the opportunity to clean up in the code nearby my changes. 
IMHO you should do a lot more cleaing up in this project. I will try to sneak 
in clean-ups whenever I can :-) My view on correct meaning of terms
*** collection: A big logical bucket to fill data into
*** slice: A logical part of a collection. A part of the data going into a 
collection goes into a particular slice. Slices for a particular collection are 
non-overlapping
*** shard: A physical instance of a slice. Running without replica there is one 
shard per slice. Running with replication-factor X there are X+1 shards per 
slice.
*** node-base-url: The prefix/base (up to and including the webapp-context) of 
the URL for a specific Solr server
*** node-name: A logical name for the Solr server - the same as node-base-url 
except /'s are replaced by _'s and the protocol part (http(s)://) is removed

If you dont want the cleaning up stuff the following parts of the patch can be 
left out
* BasicDistributedZkTest: Eveything except maybe the change from "new 
ZkCoreNodeProps(node).getCoreUrl()" to 
"ZkCoreNodeProps.getCoreUrl(node.getStr(ZkStateReader.BASE_URL_PROP), 
collection)" in method getUrlFromZk
* ShardHandler: Everything
* HttpShardHandler: Everything
* OverseerCollectionProcessor: The renaming stuff

The important stuff is in OverseerCollectionProcessor - the modified shard 
allocation algoritm that allows for multiple shards from the same collection on 
each Solr server, and the bug-fix dealing with too eager reuse of 
params-objects.

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
> Attachments: SOLR-4114.patch
>
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is muc

[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-27 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504724#comment-13504724
 ] 

Jack Krupansky commented on SOLR-139:
-

I just tried it and the feature does work as advertised. If there is a bug, 
that should be filed as a separate issue. If there is a question or difficulty 
using the feature, that should be pursued on the Solr user list.

For reference, I took a fresh, stock copy of the Solr 4.0 example, no changes 
to schema or config, and added one document:

{code}
curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '
[{"id":"id-123","title":"My original Title", "content": "Initial content"}]'
{code}

I queried it and it looked fine.

I then modified only the title field:

{code}
curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '
[{"id":"id-123","title":{"set":"My new title"}}]'
{code}

I tried the XML equivalents and that worked fine as well, with the original 
content field preserved.


> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: How does lucene handle the wildcard and fuzzy queries internally?

2012-11-27 Thread sri krishna
It looks that Levenshtein Automaton was introduced in the new version of
lucene, earlier it used to be brute force approach.

1) How about the prefix queries handled ?

2) In general, there is sorted term list which is mapped to the doc-ids in
which the corresponding terms occurred (inverted index). Does lucene has
any data structure to store this term-list used for efficient search, as
storing in some for of balanced binary search tree or tries need
serialising and  de-serialising every time it is accessed or needed, which
is very expensive task, as it needs complete scan of all data.


On Tue, Nov 27, 2012 at 2:50 PM, Federico Méndez wrote:

> As an introduction you can read this wonderful article:
> http://java.dzone.com/news/lucenes-fuzzyquery-100-times
>
>
> On Tue, Nov 27, 2012 at 10:08 AM, sri krishna wrote:
>
>>
>> How does lucene handle the wildcard and fuzzy queries internally?
>>
>> It looks like data stored as term->posting list. In fact what data
>> structures to generate efficient results?
>>
>> If it is using compressed trie, how does it handle the segments merging
>> efficiently ?. If it is using just a linear scan to find the words in
>> query, how does prefix based terms are found ?.  Can anyone give much more
>> explained details on such advanced queries handled in lucene from
>> -efficiency point of view.
>>
>>
>> Thanks
>
>
>


Re: Active 4.x branches?

2012-11-27 Thread Per Steffensen
Thanks a lot Steve. I have done my best to get the HowToContribute up to 
date wrt reflecting current "state of affairs". Of course feel free to 
change if I did something wrong.


Regards, Per Steffensen

Steve Rowe skrev:

Per,

The standard way to develop in Lucene and Solr is against trunk, then backport 
to the stable branch (currently branch_4x), and then to any applicable bugfix 
branches if such branches are expected to be released.  (If a change is too 
radical for the stable branch, it will of course not be backported.)

About maintaining documentation and your putting in effort there: I agree with 
Mark Miller's characterization of people who become committers: first they act 
like committers (e.g. taking responsibility for things they think are 
important; working with other community members to reach agreement; sustained 
substantial contributions; etc.), then PMC member(s) take note and make them a 
committer.

Don't get hung up waiting for a committer to contribute, especially in 
documentation - in fact people new to the process are uniquely positioned to 
recognize gaps here - there is a lot you can do without commit rights.

Steve



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2141) NullPointerException when using escapeSql function

2012-11-27 Thread James Dyer (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-2141:
-

Attachment: SOLR-2141.patch

final patch to commit, fixes a locale problem in the unit test.

Dominik, this patch solves the absence of the dih.functions. namespace, so you 
can use either "dih." or "dataimport." as before.  Also, it solves the related 
problem on SOLR-3842.  Finally it has a pretty good unit test that demonstrates 
this issue, SOLR-3842 and SOLR-4047 (this last one doesn't appear to actually 
be broken).

> NullPointerException when using escapeSql function
> --
>
> Key: SOLR-2141
> URL: https://issues.apache.org/jira/browse/SOLR-2141
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4.1, 4.0
> Environment: openjdk 1.6.0 b12
>Reporter: Edward Rudd
>Assignee: James Dyer
> Fix For: 4.1, 5.0
>
> Attachments: dih-config.xml, dih-file.xml, SOLR-2141.b341f5b.patch, 
> SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, 
> SOLR-2141.patch, SOLR-2141.patch, SOLR-2141-sample.patch, SOLR-2141-test.patch
>
>
> I have two entities defined, nested in each other..
> 
>  
>
> 
> Now, when I run that it bombs on any article where subcategory = '' (it's a 
> NOT NULL column so empty string is there)  If i do where subcategory!='' in 
> the article query it works fine (aside from not pulling in all of the 
> articles).
> org.apache.solr.handler.dataimport.DataImportHandlerException: 
> java.lang.NullPointerException
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
> at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
> at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:75)
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:216)
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:204)
> at 
> org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:107)
> at 
> org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81)
> at 
> org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75)
> at 
> org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:87)
> at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
> ... 6 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4576:


Attachment: LUCENE-4576.patch

> Remove CachingWrapperFilter recacheDeletes boolean
> --
>
> Key: LUCENE-4576
> URL: https://issues.apache.org/jira/browse/LUCENE-4576
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
> Attachments: LUCENE-4576.patch
>
>
> I think this option is bad news, its just a trap that causes caches to be 
> uselessly invalidated.
> If you really have a totally static index then just expunge your deletes.
> Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504698#comment-13504698
 ] 

Per Steffensen commented on SOLR-4114:
--

Well I see no reason to introduce (in the first step at least) a 
maxShardsPerNode. Your requested numShards and the number of live servers at 
that point in time will decide the number of shards of each node/server - 
basically no limit, but the clever user of Solr will probably not want to 
request to have 1000 shards for a collection if he only have 2 Solr servers, 
but it should actually be up to the user of Solr. We have been using Solr-cloud 
for a long time now and we have a very high focus on performance, because we 
need to end up with a Solr cluster supporting "live-searches" among 50-100 
billion records. During numerous performance test we have a.o. played with the 
number of shards per Solr server per collection. We have run one-month+ tests 
just pumping data into the cluster to se how loading-time, search-time etc. 
develops as collections are filled with data. We have run such tests with 1, 4, 
8 and 12 shards per Solr server per collection, and each of them have both good 
and bad properties wrt performance, so until we know (and we should be very 
carefull taking "good decissions on behalf of every Solr user") that there is a 
always-true best number for maxShardsPerNode we should be careful putting any 
limit on the user.

I know you just want to give a maxShardsPerNode in the create request, but the 
user of Solr really should be able to calculate the number of shards going on 
each Solr when he controls the numShards and when he knows how many Solr 
servers he is running. Only potential problem is if his create request is run 
when not all Solr servers are running, and in such case a maxShardsPerNode 
could help to stop the creation process.

But a Solr user probably want to make sure all Solr servers that are supposed 
to run, are actually running, before he issues a collection creation request, 
so that he gets shards distributed across all the Solr servers he intend to 
run. We do that in our project BTW, but outside Solr code.

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-1920) Need generic placemarker for DIH delta-import

2012-11-27 Thread Alexey Serba (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504675#comment-13504675
 ] 

Alexey Serba commented on SOLR-1920:


bq. If Solr could be informed that field X is the tracking field, the highest 
value encountered during an import (according to that field's sort mechanism) 
could be stored in dataimport.properties and re-used during the next 
delta-import.
+1

So you don't have to sort your documents by the autoincrement field or run 
another special query, but instead just keep the highest value for this field 
(should it support decrementing strategy? not sure).

> Need generic placemarker for DIH delta-import
> -
>
> Key: SOLR-1920
> URL: https://issues.apache.org/jira/browse/SOLR-1920
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Reporter: Shawn Heisey
>Priority: Minor
> Fix For: 4.1
>
>
> The dataimporthandler currently is only capable of saving the index timestamp 
> for later use in delta-import commands.  It should be extended to allow any 
> arbitrary data to be used as a placemarker for the next import.
> It is possible to use externally supplied variables in data-config.xml and 
> send values in via the URL that starts the import, but if the config can 
> support it natively, that is better.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504659#comment-13504659
 ] 

Yonik Seeley commented on SOLR-4114:


What's the proposed API?  Perhaps a maxShardsPerNode parameter during the 
create?
Seems like it should default to 1 (the current behavior)?

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Steve Rowe
Per,

The standard way to develop in Lucene and Solr is against trunk, then backport 
to the stable branch (currently branch_4x), and then to any applicable bugfix 
branches if such branches are expected to be released.  (If a change is too 
radical for the stable branch, it will of course not be backported.)

About maintaining documentation and your putting in effort there: I agree with 
Mark Miller's characterization of people who become committers: first they act 
like committers (e.g. taking responsibility for things they think are 
important; working with other community members to reach agreement; sustained 
substantial contributions; etc.), then PMC member(s) take note and make them a 
committer.

Don't get hung up waiting for a committer to contribute, especially in 
documentation - in fact people new to the process are uniquely positioned to 
recognize gaps here - there is a lot you can do without commit rights.

Steve
 
On Nov 27, 2012, at 9:41 AM, Per Steffensen  wrote:

> Steve Rowe skrev:
>> Hi Per,
>>   
>> 
> Hi Steve
> 
> Thanks a lot for answering so quickly!
>> Have you seen 
>>  ?
>>   
>> 
> Yep, quickly. I found no information there.
>> I don't think the current development branches are listed anywhere, but this 
>> doesn't change very often.  Feel free to add info to the above wiki page.
>>   
>> 
> I could add information, but it will not be worth much if it isnt maintained 
> whenever branch-purposes change. Since I am not a part of the core 
> Solr/Lucene team I dont think I should be the one to maintain it, and if no 
> one from the core team agrees to maintain it, it is probably not worth adding 
> it after all.
>> I recommend you subscribe to the commits mailing list, where you will see 
>> where people commit stuff.  See 
>> 
>> .
>>   
>> 
> Already did
>> 4.0.1 development: 
>> http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_0/
>> 
>> 4.1 development: 
>> http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/
>> 
>>   
>> 
> Thanks, I expected this, just wanted confirmation
>> Note that a 4.0.1 release is looking pretty unlikely, because no-one has 
>> done the work to backport bugfixes committed on branch_4x/ to 
>> lucene_solr_4_0/.  If someone were to do the work, though, such a release 
>> could happen.
>>   
>> 
> Hmmm I would expect that whenever you commit a patch you do it either to 
> branch_4x or lucene_solr_4_0, after you have considered if the patch needs to 
> be in 4.0.1 or if it can wait for 4.1. Then merging changes on 
> lucene_solr_4_0 to branch_4x once in a while, and certainly before releasing 
> 4.1. I wouldnt expect backports from branch_4x to lucene_solr_4_0 to be 
> necessary, since the patches committed to branch_4x should be put there 
> deliberately, because it was decided that it did not belong in 4.0.1.
> 
> But what do I know.
>> Steve
>>   
>> 
> Regards, Per Steffensen
>> On Nov 27, 2012, at 7:48 AM, Per Steffensen 
>>  wrote:
>> 
>>   
>> 
>>> Hi
>>> 
>>> What branches in SVN are currently used for:
>>> - 4.0.1 development
>>> - 4.1 development
>>> 
>>> Can I find updated information about this stuff online, so that I do not 
>>> have to ask?
>>> 
>>> Regards, Per Steffensen
>>> 
>>> -
>>> To unsubscribe, e-mail: 
>>> dev-unsubscr...@lucene.apache.org
>>> 
>>> For additional commands, e-mail: 
>>> dev-h...@lucene.apache.org
>>> 
>>> 
>>> 
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: 
>> dev-unsubscr...@lucene.apache.org
>> 
>> For additional commands, e-mail: 
>> dev-h...@lucene.apache.org
>> 
>> 
>> 
>>   
>> 
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-4114) Collection API: Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-4114:
-

Summary: Collection API: Allow multiple shards from one collection on the 
same Solr server  (was: Allow multiple shards from one collection on the same 
Solr server)

> Collection API: Allow multiple shards from one collection on the same Solr 
> server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504654#comment-13504654
 ] 

Per Steffensen commented on SOLR-4114:
--

Yes it is just the collections API. I could have spelled that out more clearly. 
Guess only clue was the fact that I did put label "collection-api" on :-)

> Allow multiple shards from one collection on the same Solr server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504652#comment-13504652
 ] 

Mark Miller commented on SOLR-4114:
---

This title is a little too general no? We do actually support multiple shards 
in one collection on the same server, it's just the collections API that 
doesn't support this?

> Allow multiple shards from one collection on the same Solr server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Active 4.x branches?

2012-11-27 Thread Per Steffensen

Steve Rowe skrev:

Hi Per,
  

Hi Steve

Thanks a lot for answering so quickly!

Have you seen  ?
  

Yep, quickly. I found no information there.

I don't think the current development branches are listed anywhere, but this 
doesn't change very often.  Feel free to add info to the above wiki page.
  
I could add information, but it will not be worth much if it isnt 
maintained whenever branch-purposes change. Since I am not a part of the 
core Solr/Lucene team I dont think I should be the one to maintain it, 
and if no one from the core team agrees to maintain it, it is probably 
not worth adding it after all.

I recommend you subscribe to the commits mailing list, where you will see where 
people commit stuff.  See .
  

Already did

4.0.1 development: 
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_0/
4.1 development: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/
  

Thanks, I expected this, just wanted confirmation

Note that a 4.0.1 release is looking pretty unlikely, because no-one has done 
the work to backport bugfixes committed on branch_4x/ to lucene_solr_4_0/.  If 
someone were to do the work, though, such a release could happen.
  
Hmmm I would expect that whenever you commit a patch you do it either to 
branch_4x or lucene_solr_4_0, after you have considered if the patch 
needs to be in 4.0.1 or if it can wait for 4.1. Then merging changes on 
lucene_solr_4_0 to branch_4x once in a while, and certainly before 
releasing 4.1. I wouldnt expect backports from branch_4x to 
lucene_solr_4_0 to be necessary, since the patches committed to 
branch_4x should be put there deliberately, because it was decided that 
it did not belong in 4.0.1.


But what do I know.

Steve
  

Regards, Per Steffensen

On Nov 27, 2012, at 7:48 AM, Per Steffensen  wrote:

  

Hi

What branches in SVN are currently used for:
- 4.0.1 development
- 4.1 development

Can I find updated information about this stuff online, so that I do not have 
to ask?

Regards, Per Steffensen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org





-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org


  




Re: Active 4.x branches?

2012-11-27 Thread Steve Rowe
Hi Per,

Have you seen  ?

I don't think the current development branches are listed anywhere, but this 
doesn't change very often.  Feel free to add info to the above wiki page.

I recommend you subscribe to the commits mailing list, where you will see where 
people commit stuff.  See .

4.0.1 development: 
http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_0/
4.1 development: http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/

Note that a 4.0.1 release is looking pretty unlikely, because no-one has done 
the work to backport bugfixes committed on branch_4x/ to lucene_solr_4_0/.  If 
someone were to do the work, though, such a release could happen.

Steve

On Nov 27, 2012, at 7:48 AM, Per Steffensen  wrote:

> Hi
> 
> What branches in SVN are currently used for:
> - 4.0.1 development
> - 4.1 development
> 
> Can I find updated information about this stuff online, so that I do not have 
> to ask?
> 
> Regards, Per Steffensen
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4114) Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504602#comment-13504602
 ] 

Per Steffensen commented on SOLR-4114:
--

Patch comming up

> Allow multiple shards from one collection on the same Solr server
> -
>
> Key: SOLR-4114
> URL: https://issues.apache.org/jira/browse/SOLR-4114
> Project: Solr
>  Issue Type: New Feature
>  Components: multicore, SolrCloud
>Affects Versions: 4.0
> Environment: Solr 4.0.0 release
>Reporter: Per Steffensen
>Assignee: Per Steffensen
>  Labels: collection-api, multicore, shard, shard-allocation
>
> We should support running multiple shards from one collection on the same 
> Solr server - the run a collection with 8 shards on a 4 Solr server cluster 
> (each Solr server running 2 shards).
> Performance tests at our side has shown that this is a good idea, and it is 
> also a good idea for easy elasticity later on - it is much easier to move an 
> entire existing shards from one Solr server to another one that just joined 
> the cluter than it is to split an exsiting shard among the Solr that used to 
> run it and the new Solr.
> See dev mailing list discussion "Multiple shards for one collection on the 
> same Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504599#comment-13504599
 ] 

Uwe Schindler edited comment on LUCENE-4576 at 11/27/12 1:17 PM:
-

+1. While working on random access filters, I removed the cache options 
completely, but Mike readded them later (I have to lookup the issue number).
In my opinion, Robert is right: If you don't have a static index, the chahing 
of deletes is horrible, as it invalidates the cache on every delete. If you 
have a static index, you should expungedeletes it, so all acceptDocs will be 
null and BitsFilteredDocIdSet.wrap() (used by the Cache to apply deletes) with 
be a no-op.

  was (Author: thetaphi):
+1. While working on random access filters, I removed the cache options 
completely, but Mike readded them later (I have to lookup the issue number).
In my opinion, Robert is right: If you don't have a static index, the chahing 
of deletes is horrible, as it invalidates the cache on every delete. If you 
have a static index, you should expungedeletes it, so all acceptDocs will be 
null and BitsFilteredDocIdSet (used by the Cache to apply deletes) with be a 
no-op.
  
> Remove CachingWrapperFilter recacheDeletes boolean
> --
>
> Key: LUCENE-4576
> URL: https://issues.apache.org/jira/browse/LUCENE-4576
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>
> I think this option is bad news, its just a trap that causes caches to be 
> uselessly invalidated.
> If you really have a totally static index then just expunge your deletes.
> Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-4576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504599#comment-13504599
 ] 

Uwe Schindler commented on LUCENE-4576:
---

+1. While working on random access filters, I removed the cache options 
completely, but Mike readded them later (I have to lookup the issue number).
In my opinion, Robert is right: If you don't have a static index, the chahing 
of deletes is horrible, as it invalidates the cache on every delete. If you 
have a static index, you should expungedeletes it, so all acceptDocs will be 
null and BitsFilteredDocIdSet (used by the Cache to apply deletes) with be a 
no-op.

> Remove CachingWrapperFilter recacheDeletes boolean
> --
>
> Key: LUCENE-4576
> URL: https://issues.apache.org/jira/browse/LUCENE-4576
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Robert Muir
>
> I think this option is bad news, its just a trap that causes caches to be 
> uselessly invalidated.
> If you really have a totally static index then just expunge your deletes.
> Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-4576) Remove CachingWrapperFilter recacheDeletes boolean

2012-11-27 Thread Robert Muir (JIRA)
Robert Muir created LUCENE-4576:
---

 Summary: Remove CachingWrapperFilter recacheDeletes boolean
 Key: LUCENE-4576
 URL: https://issues.apache.org/jira/browse/LUCENE-4576
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


I think this option is bad news, its just a trap that causes caches to be 
uselessly invalidated.

If you really have a totally static index then just expunge your deletes.

Let's remove the option and complexity.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Multiple shards for one collection on the same Solr server

2012-11-27 Thread Per Steffensen

Well I have created issue SOLR-4114 on the subject. Patch comming up.

Regards, Per Steffensen

Per Steffensen skrev:

Mark Miller skrev:

The Collections API was fairly rushed - so that 4.0 had something easier than 
the CoreAdmin API.
  
Yes I see. Our collection-creation code is more sophisticated than 
yours. We probably would like to migrate to the Solr Collection API 
now anyway - to be using it already when features are added later.

Due to that, it has a variety of limitations:

1. It only picks instances for a collection one way - randomly from the list of 
live instances. This means it's no good for multiple shards on the same 
instance. You should have enough instances to satisfy numShards X 
replicationFactor (although just being short on replicationFactor will 
currently just use what is there)
  
Well I think it shuffles the list of live-nodes and the begin 
assigning shard from one end. That is ok for us for now. But it will 
not start over in the list of live-nodes when there are more shards 
(shards * replica) than instances. This could easily be acheived, 
without making a very fancy allocation algorithm

2. It randomly chooses which instances to use rather than allowing manual 
specification or looking at existing cores.
  
A manual spec would be nice to be able to control everything if you 
really want to. But you probably also want to make different built-in 
shard-allocation-strategies that can be used out-of-the-box. E.g. a 
"AlwaysAssignNextShardToInstanceWithFewestShardsAlready"-strategy, but 
there are also other concerns that might be more interesting for 
people to have build into assignment algorithms - e.g. a rack-aware 
algorithm that assign replica of the same slice to instances running 
on different "racks".

3. You cannot get responses of success or failure other than polling for the 
expected results later.
  

Well we do that anyway, and will keep doing that in our own code for now.

Someone has a patch up for 3 that I hope to look at soon - others have 
contributed bug fixes that will be in 4.1. We still need to add the ability to 
control placement in other ways though.

I would say there are def plans, but I don't personally know exactly when I'll 
find the time for it, if others don't jump in.
  
Well I would like to jump in with respect to making support for 
running several shards of the same collection on the same instance, it 
is just so damn hard to get you to commit stuff :-) and we really dont 
want to have too many differences in our Solr compared to Apache Solr 
(and we have enough already - SOLR-3178 etc.). It seems like this 
feature with several shards on same instance is the only missing 
feature of the Collection API before we can "live with it".

- Mark
  

Regards, Per Steffensen




[jira] [Created] (SOLR-4114) Allow multiple shards from one collection on the same Solr server

2012-11-27 Thread Per Steffensen (JIRA)
Per Steffensen created SOLR-4114:


 Summary: Allow multiple shards from one collection on the same 
Solr server
 Key: SOLR-4114
 URL: https://issues.apache.org/jira/browse/SOLR-4114
 Project: Solr
  Issue Type: New Feature
  Components: multicore, SolrCloud
Affects Versions: 4.0
 Environment: Solr 4.0.0 release
Reporter: Per Steffensen
Assignee: Per Steffensen


We should support running multiple shards from one collection on the same Solr 
server - the run a collection with 8 shards on a 4 Solr server cluster (each 
Solr server running 2 shards).

Performance tests at our side has shown that this is a good idea, and it is 
also a good idea for easy elasticity later on - it is much easier to move an 
entire existing shards from one Solr server to another one that just joined the 
cluter than it is to split an exsiting shard among the Solr that used to run it 
and the new Solr.

See dev mailing list discussion "Multiple shards for one collection on the same 
Solr server"

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Active 4.x branches?

2012-11-27 Thread Per Steffensen

Hi

What branches in SVN are currently used for:
- 4.0.1 development
- 4.1 development

Can I find updated information about this stuff online, so that I do not 
have to ask?


Regards, Per Steffensen

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-139) Support updateable/modifiable documents

2012-11-27 Thread Lukas Graf (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504540#comment-13504540
 ] 

Lukas Graf edited comment on SOLR-139 at 11/27/12 11:38 AM:


This feature doesn't work as advertised in Solr 4.0.0 (final).

Since it's not documented, I used the information in these blog posts 
([yonik.com|http://yonik.com/solr/atomic-updates/], 
[solr.pl|http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/]) and 
this ticket to try to get it working, and asked in the #solr IRC channel, to no 
avail.

Whenever I use the 'set' command in an update message, it mangles the value to 
something like {code:xml}{set=My new title}{code} , and 
drops all other fields.

I tried the JSON as well as the XML Syntax for the update message, and I tried 
it with both a manually defined '_version_' field and without.

Relevant parts from my schema.xml:

{code:xml}



  
  
  
  
  




UID


{code}

I initially created some content like this:

{noformat}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '[{"UID":"7cb8a43c","Title":"My original 
Title", "Creator": "John Doe"}]'
{noformat}

Which resulted in this document:

{code:xml}

7cb8a43c
My original Title
John Doe

{code}

Then I tried to update that document with this statement:

{noformat}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '[{"UID":"7cb8a43c","Title":{"set":"My new 
title"}}]'
{noformat}

Which resulted in this mangled document:

{code:xml}

7cb8a43c
{set=My new title}

{code}

(I would have expected the document to still have the value 'John Doe' for the 
'Creator' field,
and have the value of its 'Title' field update to 'My new title')

I tried using the XML format for the update message as well:

{code:xml}

   
   7cb8a43c
   My new title
   

{code}

Same result as above.

  was (Author: lukasgraf):
This feature doesn't work as advertised in Solr 4.0.0 (final).

Since it's not documented, I used the information in these blog posts 
([yonik.com|http://yonik.com/solr/atomic-updates/], 
[solr.pl|http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/]) and 
this ticket to try to get it working, and asked in the #solr IRC channel, to no 
avail.

Whenever I use the 'set' command in an update message, it mangles the value to 
something like {code:xml}{set=My new title}{code} , and 
drops all other fields.

I tried the JSON as well as the XML Syntax for the update message, and I tried 
it with both a manually defined '_version_' field and without.

Relevant parts from my schema.xml:

{code:xml}



  
  
  
  
  




UID


{code}

I initially created some content like this:

{noformat}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '[{"UID":"7cb8a43c","Title":"My original 
Title", "Creator": "John Doe"}]'
{noformat}

Which resulted in this document:

{code:xml}

7cb8a43c
My original Title
John Doe

{code}

Then I tried to update that document with this statement:

{noformat}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '[{"UID":"7cb8a43c","Title":{"set":"My new 
title"}}]'
{noformat}

Which resulted in this mangled document:

{code:xml}

7cb8a43c
{set=My new title}

{code}

(I would have expected the document to still have the value 'John Doe' for the 
'Creator' field,
and have the value of its 'Title' field update to 'My new title')

I tried using the XML format for the update message as well:

{code:xml}

   
   7cb8a43c
   My new title
   

{code}

Same result as above.
  
> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumen

[jira] [Commented] (SOLR-139) Support updateable/modifiable documents

2012-11-27 Thread Lukas Graf (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504540#comment-13504540
 ] 

Lukas Graf commented on SOLR-139:
-

This feature doesn't work as advertised in Solr 4.0.0 (final).

Since it's not documented, I used the information in these blog posts 
([yonik.com|http://yonik.com/solr/atomic-updates/], 
[solr.pl|http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/]) and 
this ticket to try to get it working, and asked in the #solr IRC channel, to no 
avail.

Whenever I use the 'set' command in an update message, it mangles the value to 
something like {code:xml}{set=My new title}{code} , and 
drops all other fields.

I tried the JSON as well as the XML Syntax for the update message, and I tried 
it with both a manually defined '_version_' field and without.

Relevant parts from my schema.xml:

{code:xml}



  
  
  
  
  




UID


{code}

I initially created some content like this:

{noformat}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '[{"UID":"7cb8a43c","Title":"My original 
Title", "Creator": "John Doe"}]'
{noformat}

Which resulted in this document:

{code:xml}

7cb8a43c
My original Title
John Doe

{code}

Then I tried to update that document with this statement:

{noformat}
$ curl 'localhost:8983/solr/update?commit=true' -H 
'Content-type:application/json' -d '[{"UID":"7cb8a43c","Title":{"set":"My new 
title"}}]'
{noformat}

Which resulted in this mangled document:

{code:xml}

7cb8a43c
{set=My new title}

{code}

(I would have expected the document to still have the value 'John Doe' for the 
'Creator' field,
and have the value of its 'Title' field update to 'My new title')

I tried using the XML format for the update message as well:

{code:xml}

   
   7cb8a43c
   My new title
   

{code}

Same result as above.

> Support updateable/modifiable documents
> ---
>
> Key: SOLR-139
> URL: https://issues.apache.org/jira/browse/SOLR-139
> Project: Solr
>  Issue Type: New Feature
>  Components: update
>Reporter: Ryan McKinley
> Fix For: 4.0
>
> Attachments: Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, 
> Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, 
> getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, 
> SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, 
> SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, 
> SOLR-269+139-ModifiableDocumentUpdateProcessor.patch
>
>
> It would be nice to be able to update some fields on a document without 
> having to insert the entire document.
> Given the way lucene is structured, (for now) one can only modify stored 
> fields.
> While we are at it, we can support incrementing an existing value - I think 
> this only makes sense for numbers.
> for background, see:
> http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: How does lucene handle the wildcard and fuzzy queries internally?

2012-11-27 Thread Federico Méndez
As an introduction you can read this wonderful article:
http://java.dzone.com/news/lucenes-fuzzyquery-100-times

On Tue, Nov 27, 2012 at 10:08 AM, sri krishna  wrote:

>
> How does lucene handle the wildcard and fuzzy queries internally?
>
> It looks like data stored as term->posting list. In fact what data
> structures to generate efficient results?
>
> If it is using compressed trie, how does it handle the segments merging
> efficiently ?. If it is using just a linear scan to find the words in
> query, how does prefix based terms are found ?.  Can anyone give much more
> explained details on such advanced queries handled in lucene from
> -efficiency point of view.
>
>
> Thanks


[jira] [Commented] (SOLR-2141) NullPointerException when using escapeSql function

2012-11-27 Thread Dominik Siebel (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13504443#comment-13504443
 ] 

Dominik Siebel commented on SOLR-2141:
--

Hi [~jdyer],
Index with the whole dataset looks good. So things seem to be fixed now. Thanks 
for that!

Just another dumb question:
What do I use the patch for? Aren't those the changes that are already in the 
branch_4x & trunk?

> NullPointerException when using escapeSql function
> --
>
> Key: SOLR-2141
> URL: https://issues.apache.org/jira/browse/SOLR-2141
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - DataImportHandler
>Affects Versions: 1.4.1, 4.0
> Environment: openjdk 1.6.0 b12
>Reporter: Edward Rudd
>Assignee: James Dyer
> Fix For: 4.1, 5.0
>
> Attachments: dih-config.xml, dih-file.xml, SOLR-2141.b341f5b.patch, 
> SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, SOLR-2141.patch, 
> SOLR-2141.patch, SOLR-2141-sample.patch, SOLR-2141-test.patch
>
>
> I have two entities defined, nested in each other..
> 
>  
>
> 
> Now, when I run that it bombs on any article where subcategory = '' (it's a 
> NOT NULL column so empty string is there)  If i do where subcategory!='' in 
> the article query it works fine (aside from not pulling in all of the 
> articles).
> org.apache.solr.handler.dataimport.DataImportHandlerException: 
> java.lang.NullPointerException
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:383)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
> at 
> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
> at 
> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
> at 
> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
> Caused by: java.lang.NullPointerException
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$1.evaluate(EvaluatorBag.java:75)
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:216)
> at 
> org.apache.solr.handler.dataimport.EvaluatorBag$5.get(EvaluatorBag.java:204)
> at 
> org.apache.solr.handler.dataimport.VariableResolverImpl.resolve(VariableResolverImpl.java:107)
> at 
> org.apache.solr.handler.dataimport.TemplateString.fillTokens(TemplateString.java:81)
> at 
> org.apache.solr.handler.dataimport.TemplateString.replaceTokens(TemplateString.java:75)
> at 
> org.apache.solr.handler.dataimport.VariableResolverImpl.replaceTokens(VariableResolverImpl.java:87)
> at 
> org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:71)
> at 
> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
> at 
> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:357)
> ... 6 more

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org