date:20130110

[jira] [Created] (LUCENE-4680) Add reusability to FacetFields

2013-01-10 Thread Shai Erera (JIRA)

Shai Erera created LUCENE-4680:
--

 Summary: Add reusability to FacetFields
 Key: LUCENE-4680
 URL: https://issues.apache.org/jira/browse/LUCENE-4680
 Project: Lucene - Core
  Issue Type: Improvement
  Components: modules/facet
Reporter: Shai Erera


In LUCENE-4647 I added a TODO to handle resubility to this class. Currently it 
allocates two new TokenStreams for every document, as well as some BytesRefs 
and an IntsRef. I think it should be possible to reuses those across documents 
(and also the Field instances, while we're at it).

It will make the class not thread-safe, but I don't think that's an important 
feature. {{CategoryDocumentBuilder}} (its predecessor) wasn't thread-safe 
either, and {{Field}} isn't thread safe, so it's fine by me if {{FacetFields}} 
isn't thread safe too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550863#comment-13550863
 ] 

Shai Erera commented on LUCENE-4620:


bq. I'll open an issue to take care of FacetFields reusability

Done. Opened LUCENE-4680.

> Explore IntEncoder/Decoder bulk API
> ---
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550852#comment-13550852
 ] 

Shai Erera commented on LUCENE-4620:


bq. I think we should remove it

Ok I will.

bq. It is unfortunate that the common case is often held back by the full 
flexibility/generality of the facet module

With LUCENE-4647, the common case suffers less from the full generality of the 
facets module. I'll open an issue to take care of FacetFields reusability and 
there I hope I'll be able to tackle successfully the reusability of BytesRefs 
for one as well as many CLPs.

IMO though, having a single entry point for users to index facets, be it 1 
facet per document, or 2500 (a real case!), is important. We need to make sure 
though that the 1 facet case is added the least overhead (e.g. using 
Collections.singletonMap, or the trick I've done in 
CountingListBuilder.OrdinalsEncoder (with/out partitions)).

> Explore IntEncoder/Decoder bulk API
> ---
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3982) Admin UI: Various Dataimport Improvements

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-3982.
-

   Resolution: Fixed
Fix Version/s: (was: 4.2)
   (was: 5.0)
   4.1

> Admin UI: Various Dataimport Improvements
> -
>
> Key: SOLR-3982
> URL: https://issues.apache.org/jira/browse/SOLR-3982
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Shawn Heisey
>Assignee: Stefan Matheis (steffkes)
> Fix For: 4.1
>
> Attachments: SOLR-3982.patch, SOLR-3982.patch, SOLR-3982.patch
>
>
> Started with Shawn's Request about a small refresh link, one change leads to 
> the next, which is the reason why i changed this issue towards a more common 
> one
> This Patch brings:
> * A "Refresh Status" Button
> * A "Abort Import" Button
> * Improved Status-Handling 
> _(was buggy if you have multiple Cores with Handlers for Dataimport defined 
> and you switched the view while at least one was running)_
> * Additional Stats on Rows/Documents
> _(on-the-fly calculated "X Docs/second")_
> * less buggy duration-to-readable-time conversion
> _(until now resulted in NaN's showing up on your Screen)_
> Original Description:
> {quote}The dataimport section under each core on the admin gui does not 
> provide a way to get the current import status.  I actually would like to see 
> it automatically pull the status as soon as you click on "Dataimport" ... I 
> have never seen an import status with a qtime above 1 millisecond.  A refresh 
> icon/link would be good to have as well.
> Additional note: the resulting URL in the address bar is a little odd:
> http://server:port/solr/#/corename/dataimport//dataimport{quote}
> Although i gave a short explanation on the URL looking a bit odd:
> The first "dataimport" is required for the UI to detect which section you're 
> browsing .. the second "/dataimport" (including the slash, yes) is coming 
> from your solrconfig :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4296) Admin UI : Improve Dataimport Auto-Refresh

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)

Stefan Matheis (steffkes) created SOLR-4296:
---

 Summary: Admin UI : Improve Dataimport Auto-Refresh
 Key: SOLR-4296
 URL: https://issues.apache.org/jira/browse/SOLR-4296
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Affects Versions: 4.0
Reporter: Stefan Matheis (steffkes)
Assignee: Stefan Matheis (steffkes)
Priority: Minor


As follow up on SOLR-3982, the Auto-Refresh Option could be extended:

The current workflow could be describe as this:
{quote}If you click on "Execute", the full- or delta-import is started. 
afterwards a status update is triggered automatically. if the status is 'busy' 
and you check the auto-refresh option, it will schedule another status up in 2 
seconds.{quote}

Having said that, what isn't catched at the moment is a "External 
Import-Trigger". The UI does not detected that, one has to trigger the Status 
Update at least once manually to join that "status-circle" which is described 
in the workflow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3982) Admin UI: Various Dataimport Improvements

2013-01-10 Thread Commit Tag Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550595#comment-13550595
 ] 

Commit Tag Bot commented on SOLR-3982:
--

[branch_4x commit] Stefan Matheis
http://svn.apache.org/viewvc?view=revision&revision=1431758

SOLR-3982: Admin UI: Various Dataimport Improvements (merge r1431756)


> Admin UI: Various Dataimport Improvements
> -
>
> Key: SOLR-3982
> URL: https://issues.apache.org/jira/browse/SOLR-3982
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Shawn Heisey
>Assignee: Stefan Matheis (steffkes)
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-3982.patch, SOLR-3982.patch, SOLR-3982.patch
>
>
> Started with Shawn's Request about a small refresh link, one change leads to 
> the next, which is the reason why i changed this issue towards a more common 
> one
> This Patch brings:
> * A "Refresh Status" Button
> * A "Abort Import" Button
> * Improved Status-Handling 
> _(was buggy if you have multiple Cores with Handlers for Dataimport defined 
> and you switched the view while at least one was running)_
> * Additional Stats on Rows/Documents
> _(on-the-fly calculated "X Docs/second")_
> * less buggy duration-to-readable-time conversion
> _(until now resulted in NaN's showing up on your Screen)_
> Original Description:
> {quote}The dataimport section under each core on the admin gui does not 
> provide a way to get the current import status.  I actually would like to see 
> it automatically pull the status as soon as you click on "Dataimport" ... I 
> have never seen an import status with a qtime above 1 millisecond.  A refresh 
> icon/link would be good to have as well.
> Additional note: the resulting URL in the address bar is a little odd:
> http://server:port/solr/#/corename/dataimport//dataimport{quote}
> Although i gave a short explanation on the URL looking a bit odd:
> The first "dataimport" is required for the UI to detect which section you're 
> browsing .. the second "/dataimport" (including the slash, yes) is coming 
> from your solrconfig :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3982) Admin UI: Various Dataimport Improvements

2013-01-10 Thread Commit Tag Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550591#comment-13550591
 ] 

Commit Tag Bot commented on SOLR-3982:
--

[trunk commit] Stefan Matheis
http://svn.apache.org/viewvc?view=revision&revision=1431756

SOLR-3982: Admin UI: Various Dataimport Improvements


> Admin UI: Various Dataimport Improvements
> -
>
> Key: SOLR-3982
> URL: https://issues.apache.org/jira/browse/SOLR-3982
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Shawn Heisey
>Assignee: Stefan Matheis (steffkes)
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-3982.patch, SOLR-3982.patch, SOLR-3982.patch
>
>
> Started with Shawn's Request about a small refresh link, one change leads to 
> the next, which is the reason why i changed this issue towards a more common 
> one
> This Patch brings:
> * A "Refresh Status" Button
> * A "Abort Import" Button
> * Improved Status-Handling 
> _(was buggy if you have multiple Cores with Handlers for Dataimport defined 
> and you switched the view while at least one was running)_
> * Additional Stats on Rows/Documents
> _(on-the-fly calculated "X Docs/second")_
> * less buggy duration-to-readable-time conversion
> _(until now resulted in NaN's showing up on your Screen)_
> Original Description:
> {quote}The dataimport section under each core on the admin gui does not 
> provide a way to get the current import status.  I actually would like to see 
> it automatically pull the status as soon as you click on "Dataimport" ... I 
> have never seen an import status with a qtime above 1 millisecond.  A refresh 
> icon/link would be good to have as well.
> Additional note: the resulting URL in the address bar is a little odd:
> http://server:port/solr/#/corename/dataimport//dataimport{quote}
> Although i gave a short explanation on the URL looking a bit odd:
> The first "dataimport" is required for the UI to detect which section you're 
> browsing .. the second "/dataimport" (including the slash, yes) is coming 
> from your solrconfig :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550568#comment-13550568
 ] 

Steve Rowe commented on LUCENE-4134:


bq. I think Maven can technically handle artifact naming schemes that differ 
from artifactId-version(-type).jar, but I've never done that before, and I 
personally don't think it's worth the effort, especially given the IMHO goofy 
result.

Hmm, I just realized that there are actually two possible ways to address the 
naming difference - the above comment confuses the second with the first: 

# Change the Solr artifactIds, and maybe also the groupId, by prepending 
"apache-".  This is what I was calling goofy.
# Change the Solr artifact filenames by prepending "apache-", without changing 
the artifactIds or the groupId.

Thinking about it more, I'm dubious that #2 is even possible; while I know that 
maven-jar-plugin (the Maven jar producer) allows for its output to be named 
non-conventionally, I don't know how Maven resolution would work if the 
filename starts with something other than the artifactId.

> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4295) SolrQuery setFacet() and getFacet() should have versions that specify the field

2013-01-10 Thread Colin Bartolome (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Bartolome updated SOLR-4295:
--

Description: 
Since the parameter names for field-specific faceting parameters are a little 
odd, such as "f.field_name.facet.prefix", the SolrQuery class should have 
methods that take a "field" parameter. The SolrQuery.setFacetPrefix() method 
already takes such a parameter. It would be great if the rest of the 
setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.

Also, as far as I can tell, there isn't a constant for the "f." prefix. That 
would be helpful, too.

  was:
Since the parameter names for field-specific faceting parameters are a little 
odd (and undocumented), such as "f.field_name.facet.prefix", the SolrQuery 
class should have methods that take a "field" parameter. The 
SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
great if the rest of the setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.

Also, as far as I can tell, there isn't a constant for the "f." prefix. That 
would be helpful, too.


> SolrQuery setFacet*() and getFacet*() should have versions that specify the 
> field
> -
>
> Key: SOLR-4295
> URL: https://issues.apache.org/jira/browse/SOLR-4295
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 4.0
>Reporter: Colin Bartolome
>Priority: Minor
>
> Since the parameter names for field-specific faceting parameters are a little 
> odd, such as "f.field_name.facet.prefix", the SolrQuery class should have 
> methods that take a "field" parameter. The SolrQuery.setFacetPrefix() method 
> already takes such a parameter. It would be great if the rest of the 
> setFacet*() and getFacet*() methods did, too.
> The workaround is trivial, albeit clumsy: just create the parameter names by 
> hand, as necessary.
> Also, as far as I can tell, there isn't a constant for the "f." prefix. That 
> would be helpful, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550555#comment-13550555
 ] 

Steve Rowe commented on LUCENE-4134:


{quote}
bq. What "other script" did you have in mind for the maven files?

I just ment whatever we currently do to push them to to where ever we push them 
once the VOTE is official – if that's currently bundled up i na script that 
also scp's the files to people.apache.org:/dist, then lets only worry about 
changing the people.apache.org part to start committing to svn, and worry about 
switching to RCs in svn and how we upload to maven from there later.
{quote}

The process is here: 
[http://wiki.apache.org/lucene-java/PublishMavenArtifacts].  It's a two step 
process: first an Ant task stages the artifacts to the Nexus repository at 
{{repository.apache.org}}.  Then when the VOTE succeeds, the RM clicks a button 
on the Nexus web interface to publish them, and a few hours later they get 
synch'd to the Maven central repository.


> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 11:08 PM:
---

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
"cleaner", and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all assuming that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of "its in misc, its not default and your on your own if it breaks". 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

  was (Author: gbow...@fastmail.co.uk):
{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
"cleaner", and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of "its in misc, its not default and your on your own if it breaks". 
The fact it yields AFAICT no performanc

[jira] [Comment Edited] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550551#comment-13550551
 ] 

Steve Rowe edited comment on LUCENE-4134 at 1/10/13 11:06 PM:
--

bq. personally i would prefer if we don't have a separate script for changing 
the maven files.  I'm not really sure what this tester is currently doing.

s/changing/checking/ ?

Here's what the maven artifact checking portion of the smoke tester currently 
does:

# Downloads the POM templates from the branch tag in Subversion (for later 
checking that all checked-in POM templates have corresponding artifacts)
# Downloads all the files under the {{maven/}} directories at the RC location
# Verifies that there is a deployed POM for each binary jar/war
# Verifies there is a binary jar for each POM template
# Verifies that the md5/sha1 digests for each Maven jar/war exist and are 
correct
# Verifies there is a source and javadocs jar for each binary jar
# Verifies that each deployed POM's artifactId/groupId (pulled from the POM) 
matches the POM's dir+filename
# Verifies that there is the binary jar for each deployed POM
# Downloads and unpacks the official distributions, and also unpacks the Solr 
war
# Verifies that the Maven binary artifacts have same-named files (after adding 
"apache-" to the Maven Solr jars/war)

These are a couple of additional steps in there to handle non-Mavenized 
dependencies, which we don't have any of anymore; these steps could be removed. 
  

bq. Its scary to me that different build systems are producing different 
artifacts

*All* the Maven artifacts are produced by Ant, not by Maven and not by 
maven-ant-tasks. 

bq. And i know the checking isn't good enough when i see basic shit like things 
not even named the same way: SOLR-4287

maven-ant-tasks renames the Solr artifacts based on the Maven jar naming 
convention: artifactId-version(-type).jar - groupId org.apache.solr is not 
included.  This has been the Solr Maven artifact naming scheme since Solr 
artifacts started being published on the Maven central repository (v1.3).  
Using the Solr naming convention would result in the coordinates 
{{org.apache.solr.apache-solr.\*}}, or maybe even 
{{org.apache.apache-solr:apache-solr.\*}}, both of which look goofy to me.

I *think* Maven can technically handle artifact naming schemes that differ from 
artifactId-version(-type).jar, but I've never done that before, and I 
personally don't think it's worth the effort, especially given the IMHO goofy 
result.  Before SOLR-4287, I haven't seen anybody complain.  (If you look at 
SOLR-4287, by the way, the suggestion isn't to change Maven naming, it's to 
change the official Solr artifact naming.)  

  was (Author: steve_rowe):
bq. personally i would prefer if we don't have a separate script for 
changing the maven files.
I'm not really sure what this tester is currently doing.

s/changing/checking/ ?

Here's what the maven artifact checking portion of the smoke tester currently 
does:

# Downloads the POM templates from the branch tag in Subversion (for later 
checking that all checked-in POM templates have corresponding artifacts)
# Downloads all the files under the {{maven/}} directories at the RC location
# Verifies that there is a deployed POM for each binary jar/war
# Verifies there is a binary jar for each POM template
# Verifies that the md5/sha1 digests for each Maven jar/war exist and are 
correct
# Verifies there is a source and javadocs jar for each binary jar
# Verifies that each deployed POM's artifactId/groupId (pulled from the POM) 
matches the POM's dir+filename
# Verifies that there is the binary jar for each deployed POM
# Downloads and unpacks the official distributions, and also unpacks the Solr 
war
# Verifies that the Maven binary artifacts have same-named files (after adding 
"apache-" to the Maven Solr jars/war)

These are a couple of additional steps in there to handle non-Mavenized 
dependencies, which we don't have any of anymore; these steps could be removed. 
  

bq. Its scary to me that different build systems are producing different 
artifacts

*All* the Maven artifacts are produced by Ant, not by Maven and not by 
maven-ant-tasks. 

bq. And i know the checking isn't good enough when i see basic shit like things 
not even named
the same way: SOLR-4287

maven-ant-tasks renames the Solr artifacts based on the Maven jar naming 
convention: artifactId-version(-type).jar - groupId org.apache.solr is not 
included.  This has been the Solr Maven artifact naming scheme since Solr 
artifacts started being published on the Maven central repository (v1.3).  
Using the Solr naming convention would result in the coordinates 
{{org.apache.solr.apache-solr.*}}, or maybe even 
{{org.apache.apache-solr:apache-solr.*}}, both of which look goofy to me.

I *think* Maven can technically handle artifact nami

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550551#comment-13550551
 ] 

Steve Rowe commented on LUCENE-4134:


bq. personally i would prefer if we don't have a separate script for changing 
the maven files.
I'm not really sure what this tester is currently doing.

s/changing/checking/ ?

Here's what the maven artifact checking portion of the smoke tester currently 
does:

# Downloads the POM templates from the branch tag in Subversion (for later 
checking that all checked-in POM templates have corresponding artifacts)
# Downloads all the files under the {{maven/}} directories at the RC location
# Verifies that there is a deployed POM for each binary jar/war
# Verifies there is a binary jar for each POM template
# Verifies that the md5/sha1 digests for each Maven jar/war exist and are 
correct
# Verifies there is a source and javadocs jar for each binary jar
# Verifies that each deployed POM's artifactId/groupId (pulled from the POM) 
matches the POM's dir+filename
# Verifies that there is the binary jar for each deployed POM
# Downloads and unpacks the official distributions, and also unpacks the Solr 
war
# Verifies that the Maven binary artifacts have same-named files (after adding 
"apache-" to the Maven Solr jars/war)

These are a couple of additional steps in there to handle non-Mavenized 
dependencies, which we don't have any of anymore; these steps could be removed. 
  

bq. Its scary to me that different build systems are producing different 
artifacts

*All* the Maven artifacts are produced by Ant, not by Maven and not by 
maven-ant-tasks. 

bq. And i know the checking isn't good enough when i see basic shit like things 
not even named
the same way: SOLR-4287

maven-ant-tasks renames the Solr artifacts based on the Maven jar naming 
convention: artifactId-version(-type).jar - groupId org.apache.solr is not 
included.  This has been the Solr Maven artifact naming scheme since Solr 
artifacts started being published on the Maven central repository (v1.3).  
Using the Solr naming convention would result in the coordinates 
{{org.apache.solr.apache-solr.*}}, or maybe even 
{{org.apache.apache-solr:apache-solr.*}}, both of which look goofy to me.

I *think* Maven can technically handle artifact naming schemes that differ from 
artifactId-version(-type).jar, but I've never done that before, and I 
personally don't think it's worth the effort, especially given the IMHO goofy 
result.  Before SOLR-4287, I haven't seen anybody complain.  (If you look at 
SOLR-4287, by the way, the suggestion isn't to change Maven naming, it's to 
change the official Solr artifact naming.)  

> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.1 release

2013-01-10 Thread Jack Krupansky

The window of Monday through Wednesday sounds like a great target. Nothing 
says that the first RC has to be final. If whoever is doing the branch wants 
to do it on Monday rather than Tuesday, fine. If one or more of these nasty 
"blockers" gets fixed on Tuesday, we should still be open to a re-spin to 
put quality over a mere day or two of delay. But draw a hard line on 
Wednesday.


-- Jack Krupansky

-Original Message- 
From: Mark Miller

Sent: Thursday, January 10, 2013 3:36 PM
To: dev@lucene.apache.org
Subject: Re: 4.1 release

Saying tomorrow without any date that gives anyone any time to do anything 
is out of nowhere to me. People in Europe and east of that will wake up and 
find out, oh today. While pressure has been building towards a release, no 
one has proposed a date for a cutoff. I think that is always only fair. I 
think that if you were desperate to cut off to blockers tomorrow, you should 
have called for that last week.


Robert Muir's short term releases are not threatened by allowing people to 
plan and execute a release together. You can take that too far and do damage 
from the opposite direction. Giving people time to tie things up with a real 
deadline is only fair. We all know a nebulous deadline is not conducive to 
finishing up work.


I think all releases should have a known date that we agree on that gives 
developers some time to finish what they are working on or what they believe 
is important for the release. At a minimum there should be a few days for 
this. A weekend involved only seems fair. This doesn't have to be a long 
time, but it should not require we file blockers and just seems like a 
friendly way to develop together.


Monday is fine by me if others buy into it.

Otherwise, we have taken 4 or 5 months for 4.1. Let's not drag it out 
another month. But let's not do the reverse and release it tonight. The 
sensible approach always seems like we should plan out some target dates on 
the list - dates that actually give devs a chance to respond to - and then 
follow through on those dates.


- Mark

On Jan 10, 2013, at 3:26 PM, Steve Rowe  wrote:

Okay - I can see your logic, Mark, but this is not even close to out of 
nowhere.  You yourself have been vocal about making a 4.1 release for a 
couple weeks now.


I agree with Robert Muir that we should be promoting short turnaround 
releases.  If it doesn't make this release, it'll make the next one, which 
will come out in a relatively short span of time.  In this model, Blocker 
issues are the drivers, not "Fix Version".If people want stuff in the 
release, they should mark their issue as Blocker.


How about a compromise - next Monday we branch and only allow Blockers to 
block the release?


Steve

On Jan 10, 2013, at 3:08 PM, Mark Miller  wrote:

-1 from me - I don't like not giving people a target date to clean things 
up by. No one has given a proposed date to try and tie things up by - 
just calling 'hike is tomorrow' out of nowhere doesn't seem right to me.


We have a lot of people working on this over a lot of timezones. I think 
we should do the right thing and give everyone at least a few days and a 
weekend to finish getting their issues into 4.1.


- Mark

On Jan 10, 2013, at 2:36 PM, Steve Rowe  wrote:


I'd like to start sooner than next Tuesday.

I propose to make the branch tomorrow, and only allow Blocker issues to 
hold up the release after that.


A release candidate should then be possible by the middle of next week.

Steve

On Jan 10, 2013, at 2:27 PM, Mark Miller  wrote:



On Jan 10, 2013, at 2:12 PM, Steve Rowe  wrote:


I'd like to release soon.  What else blocks this?


I think we should toss out a short term date (next tuesday?) for anyone 
to get in what they need for 4.1.


Then just consider blockers after branching?

Then release?

Objections, better ideas?

I think we should give a bit of time for people to finish up what's in 
flight or fix any blockers. Then we should heighten testing and allow 
for any new blockers, and then kick it out. If we need to do a 4.2 
shortly after, so be it.


- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apach

[jira] [Updated] (SOLR-3982) Admin UI: Various Dataimport Improvements

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3982:


Attachment: SOLR-3982.patch

After a quick chat with [~elyograg], we decided to show the animated spinner 
only if "auto-refresh" is activated, otherwise the user might be confused.

> Admin UI: Various Dataimport Improvements
> -
>
> Key: SOLR-3982
> URL: https://issues.apache.org/jira/browse/SOLR-3982
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Shawn Heisey
>Assignee: Stefan Matheis (steffkes)
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-3982.patch, SOLR-3982.patch, SOLR-3982.patch
>
>
> Started with Shawn's Request about a small refresh link, one change leads to 
> the next, which is the reason why i changed this issue towards a more common 
> one
> This Patch brings:
> * A "Refresh Status" Button
> * A "Abort Import" Button
> * Improved Status-Handling 
> _(was buggy if you have multiple Cores with Handlers for Dataimport defined 
> and you switched the view while at least one was running)_
> * Additional Stats on Rows/Documents
> _(on-the-fly calculated "X Docs/second")_
> * less buggy duration-to-readable-time conversion
> _(until now resulted in NaN's showing up on your Screen)_
> Original Description:
> {quote}The dataimport section under each core on the admin gui does not 
> provide a way to get the current import status.  I actually would like to see 
> it automatically pull the status as soon as you click on "Dataimport" ... I 
> have never seen an import status with a qtime above 1 millisecond.  A refresh 
> icon/link would be good to have as well.
> Additional note: the resulting URL in the address bar is a little odd:
> http://server:port/solr/#/corename/dataimport//dataimport{quote}
> Although i gave a short explanation on the URL looking a bit odd:
> The first "dataimport" is required for the UI to detect which section you're 
> browsing .. the second "/dataimport" (including the slash, yes) is coming 
> from your solrconfig :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550538#comment-13550538
 ] 

Hoss Man commented on LUCENE-4134:
--

bq. What "other script" did you have in mind for the maven files?

I just ment whatever we currently do to push them to to where ever we push them 
once the VOTE is official -- if that's currently bundled up i na script that 
also scp's the files to people.apache.org:/dist, then lets only worry about 
changing the people.apache.org part to start committing to svn, and worry about 
switching to RCs in svn and how we upload to maven from there later.



> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Chyla updated LUCENE-4679:


Description: 
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, W should stay 
uppercase, but it is lowercased.





  was:
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, W should stay 
uppercase, but it will be lowercased.






> LowercaseExpandedTermsQueryNodeProcessor changes regex queries
> --
>
> Key: LUCENE-4679
> URL: https://issues.apache.org/jira/browse/LUCENE-4679
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Roman Chyla
>Priority: Trivial
> Attachments: LUCENE-4679.patch
>
>
> This is really a very silly request, but could the lowercase processor 
> 'abstain' from changing regex queries? For example, W should stay 
> uppercase, but it is lowercased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Chyla updated LUCENE-4679:


Description: 
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, W should stay 
uppercase, but it will be lowercased.





  was:
This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, \\W should stay uppercase, 
but it will be lowercased.






> LowercaseExpandedTermsQueryNodeProcessor changes regex queries
> --
>
> Key: LUCENE-4679
> URL: https://issues.apache.org/jira/browse/LUCENE-4679
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Roman Chyla
>Priority: Trivial
> Attachments: LUCENE-4679.patch
>
>
> This is really a very silly request, but could the lowercase processor 
> 'abstain' from changing regex queries? For example, W should stay 
> uppercase, but it will be lowercased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Roman Chyla updated LUCENE-4679:


Attachment: LUCENE-4679.patch

> LowercaseExpandedTermsQueryNodeProcessor changes regex queries
> --
>
> Key: LUCENE-4679
> URL: https://issues.apache.org/jira/browse/LUCENE-4679
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Roman Chyla
>Priority: Trivial
> Attachments: LUCENE-4679.patch
>
>
> This is really a very silly request, but could the lowercase processor 
> 'abstain' from changing regex queries? For example, \\W should stay 
> uppercase, but it will be lowercased.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4679) LowercaseExpandedTermsQueryNodeProcessor changes regex queries

2013-01-10 Thread Roman Chyla (JIRA)

Roman Chyla created LUCENE-4679:
---

 Summary: LowercaseExpandedTermsQueryNodeProcessor changes regex 
queries
 Key: LUCENE-4679
 URL: https://issues.apache.org/jira/browse/LUCENE-4679
 Project: Lucene - Core
  Issue Type: Wish
Reporter: Roman Chyla
Priority: Trivial


This is really a very silly request, but could the lowercase processor 
'abstain' from changing regex queries? For example, \\W should stay uppercase, 
but it will be lowercased.





--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Dawid Weiss (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550488#comment-13550488
 ] 

Dawid Weiss commented on LUCENE-4678:
-

This looks very cool! I looked at the patch briefly but I need to apply it to 
make sense of the whole picture. :) 
{code}
+  while(skip > 0) {
+buffer.writeByte((byte) 0);
+skip--;
+  }
{code}

this doesn't look particularly efficient but I didn't get the context where 
it's actually used from the patch so maybe it's all right.

> FST should use paged byte[] instead of single contiguous byte[]
> ---
>
> Key: LUCENE-4678
> URL: https://issues.apache.org/jira/browse/LUCENE-4678
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4678.patch, LUCENE-4678.patch
>
>
> The single byte[] we use today has several limitations, eg it limits us to < 
> 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
> it causes big RAM spikes during building when a the array has to grow.
> I took basically the same approach as LUCENE-3298, but I want to break out 
> this patch separately from changing all int -> long for > 2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3755) shard splitting

2013-01-10 Thread Mark Miller (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550477#comment-13550477
 ] 

Mark Miller commented on SOLR-3755:
---

This has a back compat break that we should address somehow or at least mention 
in changes - previously you could specify explicit shard ids and still get 
distributed updates - now if you do that, you won't get distrib updates as 
shards won't be assigned ranges.

> shard splitting
> ---
>
> Key: SOLR-3755
> URL: https://issues.apache.org/jira/browse/SOLR-3755
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
> Attachments: SOLR-3755.patch, SOLR-3755.patch
>
>
> We can currently easily add replicas to handle increases in query volume, but 
> we should also add a way to add additional shards dynamically by splitting 
> existing shards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer edited comment on LUCENE-3178 at 1/10/13 9:25 PM:
--

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}

Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote}
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
"cleaner", and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of "its in misc, its not default and your on your own if it breaks". 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

  was (Author: gbow...@fastmail.co.uk):
{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
"cleaner", and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of "its in misc, its not default and your on your own if it breaks". 
The fact it yields AFAICT no performance ga

[jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Greg Bowyer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550436#comment-13550436
 ] 

Greg Bowyer commented on LUCENE-3178:
-

{quote}
I think this is largely related to Robert's comment:
Might be interesting to revisit now that we use block compression that doesn't 
readByte(), readByte(), readByte() and hopefully avoids some of the bounds 
checks and so on that I think it helped with.
{quote}

Actually there still is quite a lot of that, I wrote locally a Directory 
implementation that dumps out all of the called operations, I can share the 
file if wanted (although its *huge*)

{quote}
Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.
{quote}
Better the JVM doesnt do memcpy in all cases but often does cpu aware 
operations that are faster.

{quote{
Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.
{quote}

This is what I am leaning towards, so far the only speedups I have seen are 
when I apt most of the behaviors of the JVM, the biggest win really is that the 
code becomes a lot simpler (partly because we don't have to worry about the 
"cleaner", and partly because we are not bound to int32 sizes so no more slice 
nonsense); despite the simpler code I don't think there is a sizable win in 
performance to warrant this approach.

I am still poking at this for a bit longer, but I am leaning towards calling 
this bust.

The other reason for this was to see if I get better behavior along the 
MADV_WILLNEED / page alignment fronts; but again I have nothing scientifically 
provable there.

(This is all amusing that I don't have some gross oversight in my 
implementation that makes it stupid slow by accident)

{quote}
I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).
{quote}
I quite agree, even if this gave huge performance wins I would still put it in 
the bucket of "its in misc, its not default and your on your own if it breaks". 
The fact it yields AFAICT no performance gains is both maddening for me and 
even more damning . 

> Native MMapDir
> --
>
> Key: LUCENE-3178
> URL: https://issues.apache.org/jira/browse/LUCENE-3178
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
> LUCENE-3178-Native-MMap-implementation.patch, 
> LUCENE-3178-Native-MMap-implementation.patch
>
>
> Spinoff from LUCENE-2793.
> Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
> level IO flags depending on the IOContext, we could in theory do something 
> similar with MMapDir.
> The problem is MMap is apparently quite hairy... and to pass the flags the 
> native code would need to invoke mmap (I think?), unlike UnixDir where the 
> code "only" has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4151) DIH 'debug' mode missing from 4.x UI

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) resolved SOLR-4151.
-

   Resolution: Duplicate
Fix Version/s: 4.1
 Assignee: Stefan Matheis (steffkes)

Marking as 'Duplicate', not completely correct but imho better than a (stupid) 
'Fixed'

> DIH 'debug' mode missing from 4.x UI
> 
>
> Key: SOLR-4151
> URL: https://issues.apache.org/jira/browse/SOLR-4151
> Project: Solr
>  Issue Type: Bug
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Hoss Man
>Assignee: Stefan Matheis (steffkes)
> Fix For: 4.1
>
>
> The new Admin UI in trunk & 4.x supports most of the DIH related 
> functionality but the "debug" options were not implemented.
> http://wiki.apache.org/solr/DataImportHandler#Interactive_Development_Mode

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3982) Admin UI: Various Dataimport Improvements

2013-01-10 Thread Stefan Matheis (steffkes) (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3982:


Attachment: SOLR-3982.patch

Updated Patch incorporates SOLR-4151 (normally i tried to handle issues 
separately, but this time it's easier to combine them)

Additionally changed:
* Show Info-Area also for 'idle' status
* Make Auto-Refresh optional via Checkbox
* Requests are now JSON and no longer XML 
_(Excluding the Configuration which is only available in XML)_

> Admin UI: Various Dataimport Improvements
> -
>
> Key: SOLR-3982
> URL: https://issues.apache.org/jira/browse/SOLR-3982
> Project: Solr
>  Issue Type: Improvement
>  Components: web gui
>Affects Versions: 4.0
>Reporter: Shawn Heisey
>Assignee: Stefan Matheis (steffkes)
> Fix For: 4.2, 5.0
>
> Attachments: SOLR-3982.patch, SOLR-3982.patch
>
>
> Started with Shawn's Request about a small refresh link, one change leads to 
> the next, which is the reason why i changed this issue towards a more common 
> one
> This Patch brings:
> * A "Refresh Status" Button
> * A "Abort Import" Button
> * Improved Status-Handling 
> _(was buggy if you have multiple Cores with Handlers for Dataimport defined 
> and you switched the view while at least one was running)_
> * Additional Stats on Rows/Documents
> _(on-the-fly calculated "X Docs/second")_
> * less buggy duration-to-readable-time conversion
> _(until now resulted in NaN's showing up on your Screen)_
> Original Description:
> {quote}The dataimport section under each core on the admin gui does not 
> provide a way to get the current import status.  I actually would like to see 
> it automatically pull the status as soon as you click on "Dataimport" ... I 
> have never seen an import status with a qtime above 1 millisecond.  A refresh 
> icon/link would be good to have as well.
> Additional note: the resulting URL in the address bar is a little odd:
> http://server:port/solr/#/corename/dataimport//dataimport{quote}
> Although i gave a short explanation on the URL looking a bit odd:
> The first "dataimport" is required for the UI to detect which section you're 
> browsing .. the second "/dataimport" (including the slash, yes) is coming 
> from your solrconfig :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550406#comment-13550406
 ] 

Robert Muir commented on LUCENE-4134:
-

personally i would prefer if we don't have a separate script for changing the 
maven files.

I'm not really sure what this tester is currently doing: but in my opinion if 
someone gets 
"Lucene 4.1" i should know WTF they got, regardless of whether its from an FTP 
site or maven.

So if it doesnt exist now, at least in the future I'd like more logic 
cross-checking between 
the two things to ensure they are consistent with each other.

Its scary to me that different build systems are producing different artifacts 
and we don't
have this today. 

And i know the checking isn't good enough when i see basic shit like things not 
even named
the same way: SOLR-4287


> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550399#comment-13550399
 ] 

Steve Rowe commented on LUCENE-4134:


bq. Wouldn't another alternative instead just continue to use our p.a.o/~ 
versus deploying to two places?

Yes, you're right: +1

bq. Then we can discuss/iterate on other changes to the release process at our 
leasiure (ie: maybe we put hte RCs in svn, and tweak the directory structure so 
a simple "svn mv" works for the dist files, and we have some other script for 
the maven files)

If the {{maven/}} directories weren't there, a simple "svn mv" would work - no 
other tweaking required.

What "other script" did you have in mind for the maven files?  Are you talking 
about the need to change the smoke tester if the maven artifacts are moved out 
of the RC?

> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.1 release

2013-01-10 Thread Mark Miller

Saying tomorrow without any date that gives anyone any time to do anything is 
out of nowhere to me. People in Europe and east of that will wake up and find 
out, oh today. While pressure has been building towards a release, no one has 
proposed a date for a cutoff. I think that is always only fair. I think that if 
you were desperate to cut off to blockers tomorrow, you should have called for 
that last week.

Robert Muir's short term releases are not threatened by allowing people to plan 
and execute a release together. You can take that too far and do damage from 
the opposite direction. Giving people time to tie things up with a real 
deadline is only fair. We all know a nebulous deadline is not conducive to 
finishing up work.

I think all releases should have a known date that we agree on that gives 
developers some time to finish what they are working on or what they believe is 
important for the release. At a minimum there should be a few days for this. A 
weekend involved only seems fair. This doesn't have to be a long time, but it 
should not require we file blockers and just seems like a friendly way to 
develop together.

Monday is fine by me if others buy into it.

Otherwise, we have taken 4 or 5 months for 4.1. Let's not drag it out another 
month. But let's not do the reverse and release it tonight. The sensible 
approach always seems like we should plan out some target dates on the list - 
dates that actually give devs a chance to respond to - and then follow through 
on those dates.

- Mark

On Jan 10, 2013, at 3:26 PM, Steve Rowe  wrote:

> Okay - I can see your logic, Mark, but this is not even close to out of 
> nowhere.  You yourself have been vocal about making a 4.1 release for a 
> couple weeks now.
> 
> I agree with Robert Muir that we should be promoting short turnaround 
> releases.  If it doesn't make this release, it'll make the next one, which 
> will come out in a relatively short span of time.  In this model, Blocker 
> issues are the drivers, not "Fix Version".If people want stuff in the 
> release, they should mark their issue as Blocker.
> 
> How about a compromise - next Monday we branch and only allow Blockers to 
> block the release?
> 
> Steve
> 
> On Jan 10, 2013, at 3:08 PM, Mark Miller  wrote:
> 
>> -1 from me - I don't like not giving people a target date to clean things up 
>> by. No one has given a proposed date to try and tie things up by - just 
>> calling 'hike is tomorrow' out of nowhere doesn't seem right to me.
>> 
>> We have a lot of people working on this over a lot of timezones. I think we 
>> should do the right thing and give everyone at least a few days and a 
>> weekend to finish getting their issues into 4.1.
>> 
>> - Mark
>> 
>> On Jan 10, 2013, at 2:36 PM, Steve Rowe  wrote:
>> 
>>> I'd like to start sooner than next Tuesday.
>>> 
>>> I propose to make the branch tomorrow, and only allow Blocker issues to 
>>> hold up the release after that.
>>> 
>>> A release candidate should then be possible by the middle of next week.
>>> 
>>> Steve
>>> 
>>> On Jan 10, 2013, at 2:27 PM, Mark Miller  wrote:
>>> 

 On Jan 10, 2013, at 2:12 PM, Steve Rowe  wrote:

> I'd like to release soon.  What else blocks this?

 I think we should toss out a short term date (next tuesday?) for anyone to 
 get in what they need for 4.1.

 Then just consider blockers after branching?

 Then release?

 Objections, better ideas?

 I think we should give a bit of time for people to finish up what's in 
 flight or fix any blockers. Then we should heighten testing and allow for 
 any new blockers, and then kick it out. If we need to do a 4.2 shortly 
 after, so be it.

 - Mark
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

>>> 
>>> 
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550383#comment-13550383
 ] 

Hoss Man commented on LUCENE-4134:
--

bq. Wouldn't another alternative instead just continue to use our p.a.o/~ 
versus deploying to two places?

+1

I would suggest that for now we move forward with the simplest possible changes 
to our overall processes that satisfies infra: using the new svn repo for our 
final release "dist", but leave everything else related to RCs, and smoke 
checking, as is.

Then we can discuss/iterate on other changes to the release process at our 
leasiure (ie: maybe we put hte RCs in svn, and tweak the directory structure so 
a simple "svn mv" works for the dist files, and we have some other script for 
the maven files)

> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4547) DocValues field broken on large indexes

2013-01-10 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-4547:


Priority: Major  (was: Blocker)

> DocValues field broken on large indexes
> ---
>
> Key: LUCENE-4547
> URL: https://issues.apache.org/jira/browse/LUCENE-4547
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.2, 5.0
>
> Attachments: test.patch
>
>
> I tried to write a test to sanity check LUCENE-4536 (first running against 
> svn revision 1406416, before the change).
> But i found docvalues is already broken here for large indexes that have a 
> PackedLongDocValues field:
> {code}
> final int numDocs = 5;
> for (int i = 0; i < numDocs; ++i) {
>   if (i == 0) {
> field.setLongValue(0L); // force > 32bit deltas
>   } else {
> field.setLongValue(1<<33L); 
>   }
>   w.addDocument(doc);
> }
> w.forceMerge(1);
> w.close();
> dir.close(); // checkindex
> {code}
> {noformat}
> [junit4:junit4]   2> WARNING: Uncaught exception in thread: Thread[Lucene 
> Merge Thread #0,6,TGRP-Test2GBDocValues]
> [junit4:junit4]   2> org.apache.lucene.index.MergePolicy$MergeException: 
> java.lang.ArrayIndexOutOfBoundsException: -65536
> [junit4:junit4]   2>  at 
> __randomizedtesting.SeedInfo.seed([5DC54DB14FA5979]:0)
> [junit4:junit4]   2>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:535)
> [junit4:junit4]   2>  at 
> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:508)
> [junit4:junit4]   2> Caused by: java.lang.ArrayIndexOutOfBoundsException: 
> -65536
> [junit4:junit4]   2>  at 
> org.apache.lucene.util.ByteBlockPool.deref(ByteBlockPool.java:305)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.lucene40.values.FixedStraightBytesImpl$FixedBytesWriterBase.set(FixedStraightBytesImpl.java:115)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.writePackedInts(PackedIntValues.java:109)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.lucene40.values.PackedIntValues$PackedIntsWriter.finish(PackedIntValues.java:80)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.DocValuesConsumer.merge(DocValuesConsumer.java:130)
> [junit4:junit4]   2>  at 
> org.apache.lucene.codecs.PerDocConsumer.merge(PerDocConsumer.java:65)
> {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.1 release

2013-01-10 Thread Steve Rowe

Okay - I can see your logic, Mark, but this is not even close to out of 
nowhere.  You yourself have been vocal about making a 4.1 release for a couple 
weeks now.

I agree with Robert Muir that we should be promoting short turnaround releases. 
 If it doesn't make this release, it'll make the next one, which will come out 
in a relatively short span of time.  In this model, Blocker issues are the 
drivers, not "Fix Version".If people want stuff in the release, they should 
mark their issue as Blocker.

How about a compromise - next Monday we branch and only allow Blockers to block 
the release?

Steve

On Jan 10, 2013, at 3:08 PM, Mark Miller  wrote:

> -1 from me - I don't like not giving people a target date to clean things up 
> by. No one has given a proposed date to try and tie things up by - just 
> calling 'hike is tomorrow' out of nowhere doesn't seem right to me.
> 
> We have a lot of people working on this over a lot of timezones. I think we 
> should do the right thing and give everyone at least a few days and a weekend 
> to finish getting their issues into 4.1.
> 
> - Mark
> 
> On Jan 10, 2013, at 2:36 PM, Steve Rowe  wrote:
> 
>> I'd like to start sooner than next Tuesday.
>> 
>> I propose to make the branch tomorrow, and only allow Blocker issues to hold 
>> up the release after that.
>> 
>> A release candidate should then be possible by the middle of next week.
>> 
>> Steve
>> 
>> On Jan 10, 2013, at 2:27 PM, Mark Miller  wrote:
>> 
>>> 
>>> On Jan 10, 2013, at 2:12 PM, Steve Rowe  wrote:
>>> 
 I'd like to release soon.  What else blocks this?
>>> 
>>> I think we should toss out a short term date (next tuesday?) for anyone to 
>>> get in what they need for 4.1.
>>> 
>>> Then just consider blockers after branching?
>>> 
>>> Then release?
>>> 
>>> Objections, better ideas?
>>> 
>>> I think we should give a bit of time for people to finish up what's in 
>>> flight or fix any blockers. Then we should heighten testing and allow for 
>>> any new blockers, and then kick it out. If we need to do a 4.2 shortly 
>>> after, so be it.
>>> 
>>> - Mark
>>> -
>>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>>> For additional commands, e-mail: dev-h...@lucene.apache.org
>>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550373#comment-13550373
 ] 

Robert Muir commented on LUCENE-4134:
-

Wouldn't another alternative instead just continue to use our p.a.o/~ versus 
deploying to two places?

I don't like having to check a "release" spread across two different places. 
And this would also make automatic 
verification difficult (today, we can pass the p.a.o link and it checks 
"everything")


> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2013-01-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-3298:
---

Attachment: LUCENE-3298.patch

Initial test to confirm FSTs can grow beyond 2GB (it fails today!).

> FST has hard limit max size of 2.1 GB
> -
>
> Key: LUCENE-3298
> URL: https://issues.apache.org/jira/browse/LUCENE-3298
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-3298.patch, LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is 
> indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3298) FST has hard limit max size of 2.1 GB

2013-01-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3298:
--

Assignee: Michael McCandless

> FST has hard limit max size of 2.1 GB
> -
>
> Key: LUCENE-3298
> URL: https://issues.apache.org/jira/browse/LUCENE-3298
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
>Priority: Minor
> Attachments: LUCENE-3298.patch
>
>
> The FST uses a single contiguous byte[] under the hood, which in java is 
> indexed by int so we cannot grow this over Integer.MAX_VALUE.  It also 
> internally encodes references to this array as vInt.
> We could switch this to a paged byte[] and make the far larger.
> But I think this is low priority... I'm not going to work on it any time soon.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4678:
---

Attachment: LUCENE-4678.patch

Duh, wrong patch ... this one should be right.

> FST should use paged byte[] instead of single contiguous byte[]
> ---
>
> Key: LUCENE-4678
> URL: https://issues.apache.org/jira/browse/LUCENE-4678
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4678.patch, LUCENE-4678.patch
>
>
> The single byte[] we use today has several limitations, eg it limits us to < 
> 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
> it causes big RAM spikes during building when a the array has to grow.
> I took basically the same approach as LUCENE-3298, but I want to break out 
> this patch separately from changing all int -> long for > 2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4678:
---

Attachment: LUCENE-4678.patch

Patch, I think it's close to ready (no format change for the FST so no back 
compat).

> FST should use paged byte[] instead of single contiguous byte[]
> ---
>
> Key: LUCENE-4678
> URL: https://issues.apache.org/jira/browse/LUCENE-4678
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/FSTs
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4678.patch, LUCENE-4678.patch
>
>
> The single byte[] we use today has several limitations, eg it limits us to < 
> 2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
> it causes big RAM spikes during building when a the array has to grow.
> I took basically the same approach as LUCENE-3298, but I want to break out 
> this patch separately from changing all int -> long for > 2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4678) FST should use paged byte[] instead of single contiguous byte[]

2013-01-10 Thread Michael McCandless (JIRA)

Michael McCandless created LUCENE-4678:
--

 Summary: FST should use paged byte[] instead of single contiguous 
byte[]
 Key: LUCENE-4678
 URL: https://issues.apache.org/jira/browse/LUCENE-4678
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/FSTs
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.2, 5.0


The single byte[] we use today has several limitations, eg it limits us to < 
2.1 GB FSTs (and suggesters in the wild are getting close to this limit), and 
it causes big RAM spikes during building when a the array has to grow.

I took basically the same approach as LUCENE-3298, but I want to break out this 
patch separately from changing all int -> long for > 2.1 GB support.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550352#comment-13550352
 ] 

Steve Rowe edited comment on LUCENE-4134 at 1/10/13 8:09 PM:
-

bq. [A]s part of this new process there will also be a 
"https://dist.apache.org/repos/dist/dev/lucene"; directory where release 
candidates can be put for review (instead of 
people.apache.org/~releasemanager/...), and if/when they are voted successfully 
a simple "svn mv" to dist/release/lucene makes them official and pushes them to 
the mirrors.

There is a wrinkle here: maven artifacts.  Our current process includes them 
with the ASF release artifacts at the RC review download link.  If we continue 
this when we instead commit RCs to 
{{repos/dist/dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM/}}, then the release 
publishing process can't be just {{svn mv 
dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM release/lucene/\{java,solr}/X.Y.Z}}.  
Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm 
dev/lucene/\{java,solr}/X.Y.ZRCN-rMMM/maven}}.

An alternative: now that we stage maven artifacts to Nexus 
(repository.apache.org) prior to the release, we could as part of an RC 
announcement also include the Nexus link, and not include the maven artifacts 
in {{repos/dist/dev/lucene/}}.  This option gets my +1.

  was (Author: steve_rowe):
bq. [A]s part of this new process there will also be a 
"https://dist.apache.org/repos/dist/dev/lucene"; directory where release 
candidates can be put for review (instead of 
people.apache.org/~releasemanager/...), and if/when they are voted successfully 
a simple "svn mv" to dist/release/lucene makes them official and pushes them to 
the mirrors.

There is a wrinkle here: maven artifacts.  Our current process includes them 
with the ASF release artifacts at the RC review download link.  If we continue 
this when we instead commit RCs to 
{{repos/dist/dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/}}, then the release 
publishing process can't be just {{svn mv 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM release/lucene/{java,solr}/X.Y.Z}}.  
Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/maven}}.

An alternative: now that we stage maven artifacts to Nexus 
(repository.apache.org) prior to the release, we could as part of an RC 
announcement also include the Nexus link.  This option gets my +1.
  
> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.1 release

2013-01-10 Thread Mark Miller

-1 from me - I don't like not giving people a target date to clean things up 
by. No one has given a proposed date to try and tie things up by - just calling 
'hike is tomorrow' out of nowhere doesn't seem right to me.

We have a lot of people working on this over a lot of timezones. I think we 
should do the right thing and give everyone at least a few days and a weekend 
to finish getting their issues into 4.1.

- Mark

On Jan 10, 2013, at 2:36 PM, Steve Rowe  wrote:

> I'd like to start sooner than next Tuesday.
> 
> I propose to make the branch tomorrow, and only allow Blocker issues to hold 
> up the release after that.
> 
> A release candidate should then be possible by the middle of next week.
> 
> Steve
> 
> On Jan 10, 2013, at 2:27 PM, Mark Miller  wrote:
> 
>> 
>> On Jan 10, 2013, at 2:12 PM, Steve Rowe  wrote:
>> 
>>> I'd like to release soon.  What else blocks this?
>> 
>> I think we should toss out a short term date (next tuesday?) for anyone to 
>> get in what they need for 4.1.
>> 
>> Then just consider blockers after branching?
>> 
>> Then release?
>> 
>> Objections, better ideas?
>> 
>> I think we should give a bit of time for people to finish up what's in 
>> flight or fix any blockers. Then we should heighten testing and allow for 
>> any new blockers, and then kick it out. If we need to do a 4.2 shortly 
>> after, so be it.
>> 
>> - Mark
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550352#comment-13550352
 ] 

Steve Rowe commented on LUCENE-4134:


bq. [A]s part of this new process there will also be a 
"https://dist.apache.org/repos/dist/dev/lucene"; directory where release 
candidates can be put for review (instead of 
people.apache.org/~releasemanager/...), and if/when they are voted successfully 
a simple "svn mv" to dist/release/lucene makes them official and pushes them to 
the mirrors.

There is a wrinkle here: maven artifacts.  Our current process includes them 
with the ASF release artifacts at the RC review download link.  If we continue 
this when we instead commit RCs to 
{{repos/dist/dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/}}, then the release 
publishing process can't be just {{svn mv 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM release/lucene/{java,solr}/X.Y.Z}}.  
Instead, we'll have to somehow exclude the maven artifacts, e.g. {{svn rm 
dev/lucene/{java,solr}/X.Y.ZRCN-rMMM/maven}}.

An alternative: now that we stage maven artifacts to Nexus 
(repository.apache.org) prior to the release, we could as part of an RC 
announcement also include the Nexus link.  This option gets my +1.

> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-4286.


Resolution: Duplicate
  Assignee: (was: Shalin Shekhar Mangar)

> Atomic Updates on multi-valued fields giving unexpected results
> ---
>
> Key: SOLR-4286
> URL: https://issues.apache.org/jira/browse/SOLR-4286
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: Windows 7 64-bit
>Reporter: Abhinav Shah
>Priority: Blocker
>
> I am using apache-solr 4.0.
> I am trying to post the following document - 
> {code}
> curl http://irvis016:8983/solr/collection1/update?commit=true -H 
> "Content-Type: text/xml" --data-binary ' boost="1.0">3165297 name="status" update="set">ORDERED update="set">US LABS DEMO ACCOUNT name="account.addresses.address1" update="set">2601 Campus 
> Drive update="set">Irvine update="set">CA update="set">92622 update="set">10442 update="set">60086 update="set">5571351625769103 name="patient.patientName.lastName" update="set">test name="patient.patientName.firstName" update="set">test123 name="patient.patientSSN" update="set">643522342 name="patient.patientDOB" update="set">1979-11-11T08:00:00.000Z name="patient.mrNs.mrn" update="set">5423 name="specimens.specimenType" update="set">Bone Marrow name="specimens.specimenType" update="set">Nerve tissue name="UID">3165297USLABS2012'
> {code}
> This document gets successfully posted. However, the multi-valued field 
> 'specimens.specimenType', gets stored as following in SOLR -
> {code}
> 
> {set=Bone Marrow}
> {set=Nerve tissue}
> 
> {code}
> I did not expect "{set=" to be stored along with the text "Bone Marror".
> My Solr schema xml definition for the field specimens.SpecimenType is - 
> {code}
>  omitNorms="false" omitPositions="true" omitTermFreqAndPositions="true" 
> stored="true" termVectors="false" type="text_en"/>
> {code}
> Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-4677:
---

Attachment: LUCENE-4677.patch

Initial patch ... not committable until I add a back-compat layer somehow ... 
(how come TestBackCompat isn't failing...).

I tested Kuromoji's TokenInfo FST, temporarily turning off packing: vInt 
encoding made the non-packed FST ~12% smaller (good!).  The packed FST is 
unchanged in size.

Then I tested on a bigger FST (AnalyzingSuggester build of FreeDB's song 
titles) and the resulting FST is nearly the same size (1.0463 GB for trunk and 
1.0458 with patch).

> Use vInt to encode node addresses inside FST
> 
>
> Key: LUCENE-4677
> URL: https://issues.apache.org/jira/browse/LUCENE-4677
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
> Attachments: LUCENE-4677.patch
>
>
> Today we use int, but towards enabling > 2.1G sized FSTs, I'd like to make 
> this vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Abhinav Shah (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550245#comment-13550245
 ] 

Abhinav Shah commented on SOLR-4286:


I tried on nightly build - apache-solr-4.1-2013-01-10_05-50-28.zip, and it 
works.

Thanks

> Atomic Updates on multi-valued fields giving unexpected results
> ---
>
> Key: SOLR-4286
> URL: https://issues.apache.org/jira/browse/SOLR-4286
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: Windows 7 64-bit
>Reporter: Abhinav Shah
>Assignee: Shalin Shekhar Mangar
>Priority: Blocker
>
> I am using apache-solr 4.0.
> I am trying to post the following document - 
> {code}
> curl http://irvis016:8983/solr/collection1/update?commit=true -H 
> "Content-Type: text/xml" --data-binary ' boost="1.0">3165297 name="status" update="set">ORDERED update="set">US LABS DEMO ACCOUNT name="account.addresses.address1" update="set">2601 Campus 
> Drive update="set">Irvine update="set">CA update="set">92622 update="set">10442 update="set">60086 update="set">5571351625769103 name="patient.patientName.lastName" update="set">test name="patient.patientName.firstName" update="set">test123 name="patient.patientSSN" update="set">643522342 name="patient.patientDOB" update="set">1979-11-11T08:00:00.000Z name="patient.mrNs.mrn" update="set">5423 name="specimens.specimenType" update="set">Bone Marrow name="specimens.specimenType" update="set">Nerve tissue name="UID">3165297USLABS2012'
> {code}
> This document gets successfully posted. However, the multi-valued field 
> 'specimens.specimenType', gets stored as following in SOLR -
> {code}
> 
> {set=Bone Marrow}
> {set=Nerve tissue}
> 
> {code}
> I did not expect "{set=" to be stored along with the text "Bone Marror".
> My Solr schema xml definition for the field specimens.SpecimenType is - 
> {code}
>  omitNorms="false" omitPositions="true" omitTermFreqAndPositions="true" 
> stored="true" termVectors="false" type="text_en"/>
> {code}
> Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-10 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-4677:
--

Assignee: Michael McCandless

> Use vInt to encode node addresses inside FST
> 
>
> Key: LUCENE-4677
> URL: https://issues.apache.org/jira/browse/LUCENE-4677
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.2, 5.0
>
>
> Today we use int, but towards enabling > 2.1G sized FSTs, I'd like to make 
> this vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4677) Use vInt to encode node addresses inside FST

2013-01-10 Thread Michael McCandless (JIRA)

Michael McCandless created LUCENE-4677:
--

 Summary: Use vInt to encode node addresses inside FST
 Key: LUCENE-4677
 URL: https://issues.apache.org/jira/browse/LUCENE-4677
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
 Fix For: 4.2, 5.0


Today we use int, but towards enabling > 2.1G sized FSTs, I'd like to make this 
vInt instead.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549975#comment-13549975
 ] 

Michael McCandless commented on LUCENE-4620:


Trunk:
{noformat}
 [java] Estimating ~1 Integers compression time by
 [java] Encoding/decoding facets' ID payload of docID = 3630 (unsorted, 
length of: 2430) 41152 times.
 [java] 
 [java] EncoderBits/Int  Encode Time
Encode Time  Decode TimeDecode Time
 [java]   [milliseconds]
[microsecond / int]   [milliseconds][microsecond / int]
 [java] 
---
 [java] VInt8   18.4955 4430
44.3003 116211.6201
 [java] Sorting (Unique (VInt8))18.4955 4344
43.4403 110511.0501
 [java] Sorting (Unique (DGap (VInt8))) 8.5597 4481 
   44.8103  842 8.4201
 [java] Sorting (Unique (DGap (EightFlags (VInt8 4.9679 
463646.3603 1021
10.2101
 [java] Sorting (Unique (DGap (FourFlags (VInt8 4.8198  
   451545.1503 1001
10.0101
 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8) 4.5794  
   490449.0403 1056 
   10.5601
 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8) 4.5794  
   475147.5103 1035 
   10.3501
 [java] 
 [java] 
 [java] Estimating ~1 Integers compression time by
 [java] Encoding/decoding facets' ID payload of docID = 9910 (unsorted, 
length of: 1489) 67159 times.
 [java] 
 [java] EncoderBits/Int  Encode Time
Encode Time  Decode TimeDecode Time
 [java]   [milliseconds]
[microsecond / int]   [milliseconds][microsecond / int]
 [java] 
---
 [java] VInt8   18.2673 1241
12.4100 112811.2800
 [java] Sorting (Unique (VInt8))18.2673 3488
34.8801  924 9.2400
 [java] Sorting (Unique (DGap (VInt8))) 8.9456 3061 
   30.6101  660 6.6000
 [java] Sorting (Unique (DGap (EightFlags (VInt8 5.7542 
369336.9301 1026
10.2600
 [java] Sorting (Unique (DGap (FourFlags (VInt8 5.5447  
   346234.6201  811 
8.1100
 [java] Sorting (Unique (DGap (NOnes (3) (FourFlags (VInt8) 5.3566  
   384638.4601 1018 
   10.1800
 [java] Sorting (Unique (DGap (NOnes (4) (FourFlags (VInt8) 5.3996  
   387938.7901 1025 
   10.2500
 [java] 
 [java] 
 [java] Estimating ~1 Integers compression time by
 [java] Encoding/decoding facets' ID payload of docID = 1 (unsorted, 
length of: 18) 555 times.
 [java] 
 [java] EncoderBits/Int  Encode Time
Encode Time  Decode TimeDecode Time
 [java]   [milliseconds]
[microsecond / int]   [milliseconds][microsecond / int]
 [java] 
---
 [java] VInt8   20.8889 1179
11.7900 111411.1400
 [java] Sorting (Unique (VInt8))20.8889 2251
22.5100 117111.7100
 [java] Sorting (Unique (DGap (VInt8)))12. 2174 
   21.7400  848 8.4800
 [java] Sorting (Unique (DGap (EightFlags (VInt810. 
237223.7200 1092
1

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #734: POMs out of sync

2013-01-10 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/734/

1 tests failed.
FAILED:  org.apache.solr.cloud.SyncSliceTest.testDistribSearch

Error Message:
shard1 should have just been set up to be inconsistent - but it's still 
consistent

Stack Trace:
java.lang.AssertionError: shard1 should have just been set up to be 
inconsistent - but it's still consistent
at 
__randomizedtesting.SeedInfo.seed([5A32B9FE8374BE51:DBD437E6F42BDE6D]:0)
at org.junit.Assert.fail(Assert.java:93)
at org.junit.Assert.assertTrue(Assert.java:43)
at org.junit.Assert.assertNotNull(Assert.java:526)
at org.apache.solr.cloud.SyncSliceTest.doTest(SyncSliceTest.java:214)
at 
org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:794)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:616)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1559)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.access$600(RandomizedRunner.java:79)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:737)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:773)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:787)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:782)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:442)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:746)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:648)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:682)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:693)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:358)

Re: 4.1 release

2013-01-10 Thread Steve Rowe

I'd like to start sooner than next Tuesday.

I propose to make the branch tomorrow, and only allow Blocker issues to hold up 
the release after that.

A release candidate should then be possible by the middle of next week.

Steve

On Jan 10, 2013, at 2:27 PM, Mark Miller  wrote:

> 
> On Jan 10, 2013, at 2:12 PM, Steve Rowe  wrote:
> 
>> I'd like to release soon.  What else blocks this?
> 
> I think we should toss out a short term date (next tuesday?) for anyone to 
> get in what they need for 4.1.
> 
> Then just consider blockers after branching?
> 
> Then release?
> 
> Objections, better ideas?
> 
> I think we should give a bit of time for people to finish up what's in flight 
> or fix any blockers. Then we should heighten testing and allow for any new 
> blockers, and then kick it out. If we need to do a 4.2 shortly after, so be 
> it.
> 
> - Mark
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549961#comment-13549961
 ] 

Michael McCandless commented on LUCENE-4620:


{quote}
bq. Can we use Collections.singletonMap when there are no partitions?

Done. Note though that BytesRef cannot be reused in the case of 
PerDimensionIndexingParams (i.e. multiple CLPs). This is not the common case, 
but it's not trivial to specialize it. Maybe as a second iteration. I did put a 
TODO in FacetFields to allow reuse.
{quote}

Well, we'd somehow need N BytesRefs to reuse (one per CLP) ... but I
don't think we should worry about that now.

It is unfortunate that the common case is often held back by the full
flexibility/generality of the facet module ... sometimes I think we
need a facet-light module.  But maybe if we can get the specialization
done we don't need facet-light ...

{quote}
bq. why do we have VInt8.bytesNeeded? Who uses that?

Currently no one uses it, but it was there and I thought that it's a convenient 
API to keep. Why encode and then see how many bytes were occupied?
Anyway, neither the encoders nor the decoders use it. I have no strong feelings 
for keeping/removing it, so if you feel like it should be removed, I can do it.
{quote}

I think we should remove it: it's a dangerous API because it can
encourage consumers to do things like call bytesNeeded first (to know
how much to grow their buffer, say) followed by encoding.  The slow
part of vInt encoding is all those ifs ...

{quote}
bq. Hmm, it's a little abusive how VInt8.decode changes the offset of the 
incoming BytesRef

It is, but that's the result of Java's lack of pass by reference. I.e., decode 
needs to return the caller two values: the decoded number and how many bytes 
were read.
Notice that in the previous byte[] variant, the method took a class Position, 
which is horrible. That's why I documented in decode() that it advances 
bytes.offset, so
the caller can restore it in the end. For instance, IntDecoder restores the 
offset to the original one in the end.

On LUCENE-4675 Robert gave me an idea to create a BytesRefIterator, and I 
started to play with it. I.e. it would wrap a BytesRef but add 'pos' and 'upto' 
indexes.
The user can modify 'pos' freely, withouth touching bytes.offset. That 
introduces an object allocation though, and since I'd want to reuse that object 
wherever
possible, I think I'll look at it after finishing this issue. It already 
contains too many changes.
{quote}

OK.

{quote}
bq. I guess this is why you want an upto

No, I wanted upto because iterating up to bytes.length is incorrect. You need 
to iterate up to offset+length. BytesRefIterator.pos and BytesRefIterator.upto 
solve these cases for me.
{quote}

OK.

{quote}
bq. looks like things got a bit slower (or possibly it's noise)

First, even if it's not noise, the slowdown IMO is worth the code 
simplification.
{quote}

+1

{quote}
But, I do believe that we'll see gains when there are more than 3 integers to 
encode/decode.
In fact, the facets test package has an EncodingSpeed class which measures the 
time it takes to encode/decode a large number of integers (a few thousands). 
When I compared the
result to 4x (i.e. without the patch), the decode time seemed to be ~x5 faster.
{quote}

Good!  Would be nice to have a real-world biggish-number-of-facets
benchmark ... I'll ponder how to do that w/ luceneutil.

bq. In this patch I added an Ant task "run-encoding-benchmark" which runs this 
class. Want to give it a try on your beast machine? For 4x, you can just copy 
the target to lucene/facet/build.xml, I believe it will work without issues.

OK I'll run it!


> Explore IntEncoder/Decoder bulk API
> ---
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will us

Re: 4.1 release

2013-01-10 Thread Mark Miller


On Jan 10, 2013, at 2:12 PM, Steve Rowe  wrote:

> I'd like to release soon.  What else blocks this?

I think we should toss out a short term date (next tuesday?) for anyone to get 
in what they need for 4.1.

Then just consider blockers after branching?

Then release?

Objections, better ideas?

I think we should give a bit of time for people to finish up what's in flight 
or fix any blockers. Then we should heighten testing and allow for any new 
blockers, and then kick it out. If we need to do a 4.2 shortly after, so be it.

- Mark
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.1 release

2013-01-10 Thread Robert Muir

On Thu, Jan 10, 2013 at 11:12 AM, Steve Rowe  wrote:
>
> LUCENE-4547  (DocValues 
> 2.0) is listed as Blocker with Fix Version including 4.2, but recent commits 
> to branches/lucene4547/ include changes to the Lucene41 codec.  Looks like 
> Fix Version should be changed to 4.1?
>

This is a pretty bad bug (you cannot use docvalues with large
segments: I initially made it blocker for that reason), but I think we
are making good progress at a good pace.

My personal opinion: Its fine to just move it out to 4.2, I'd rather
have the time to get everything nice. A 4.1 would be an improvement on
its own, even if there are known problems like that.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549957#comment-13549957
 ] 

Yonik Seeley commented on SOLR-4286:


Hopefully this is already fixed.  Can you try a recent nightly build of 4x 
(soon to become 4.1)?
http://wiki.apache.org/solr/NightlyBuilds

> Atomic Updates on multi-valued fields giving unexpected results
> ---
>
> Key: SOLR-4286
> URL: https://issues.apache.org/jira/browse/SOLR-4286
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: Windows 7 64-bit
>Reporter: Abhinav Shah
>Assignee: Shalin Shekhar Mangar
>Priority: Blocker
>
> I am using apache-solr 4.0.
> I am trying to post the following document - 
> {code}
> curl http://irvis016:8983/solr/collection1/update?commit=true -H 
> "Content-Type: text/xml" --data-binary ' boost="1.0">3165297 name="status" update="set">ORDERED update="set">US LABS DEMO ACCOUNT name="account.addresses.address1" update="set">2601 Campus 
> Drive update="set">Irvine update="set">CA update="set">92622 update="set">10442 update="set">60086 update="set">5571351625769103 name="patient.patientName.lastName" update="set">test name="patient.patientName.firstName" update="set">test123 name="patient.patientSSN" update="set">643522342 name="patient.patientDOB" update="set">1979-11-11T08:00:00.000Z name="patient.mrNs.mrn" update="set">5423 name="specimens.specimenType" update="set">Bone Marrow name="specimens.specimenType" update="set">Nerve tissue name="UID">3165297USLABS2012'
> {code}
> This document gets successfully posted. However, the multi-valued field 
> 'specimens.specimenType', gets stored as following in SOLR -
> {code}
> 
> {set=Bone Marrow}
> {set=Nerve tissue}
> 
> {code}
> I did not expect "{set=" to be stored along with the text "Bone Marror".
> My Solr schema xml definition for the field specimens.SpecimenType is - 
> {code}
>  omitNorms="false" omitPositions="true" omitTermFreqAndPositions="true" 
> stored="true" termVectors="false" type="text_en"/>
> {code}
> Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.1 release

2013-01-10 Thread Erik Hatcher

I set a couple of others to Blocker just now, which are related, probably dups. 
 Shalin is assigned to them both.

   Solr 4 atomic update incorrect value when setting two or more values to a 
multivalue via XML update
   https://issues.apache.org/jira/browse/SOLR-4294

and 

   Atomic Updates on multi-valued fields giving unexpected results
   https://issues.apache.org/jira/browse/SOLR-4286

Hopefully these aren't too bad and can make it in as well.

Erik

  

On Jan 10, 2013, at 14:12 , Steve Rowe wrote:

> As of now, there are two Blocker issues in JIRA with Fix Version 4.1: 
> 
>   Dataimporting with SolrCloud Fails
>   https://issues.apache.org/jira/browse/SOLR-4112
> 
>   modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
>   https://issues.apache.org/jira/browse/LUCENE-4134
> 
> (LUCENE-4431 - servlet-api.jar licensing - is listed as Blocker with Fix 
> Version including 4.1, but this has been fixed in branch_4x, and was reopened 
> only for 3.6.X backporting.)  
> 
> LUCENE-4547  (DocValues 
> 2.0) is listed as Blocker with Fix Version including 4.2, but recent commits 
> to branches/lucene4547/ include changes to the Lucene41 codec.  Looks like 
> Fix Version should be changed to 4.1?
> 
> I'd like to release soon.  What else blocks this?
> 
> Steve
> 
> On Dec 31, 2012, at 2:08 PM, Mark Miller  wrote:
> 
>> I've started pushing on JIRA issue for a 4.1 release.
>> 
>> If something is pushed that you are going to work on in the very near term, 
>> please put it back.
>> 
>> I'll progressively get more aggressive about pushing and count on committers 
>> to fix any mistakes if they want something in 4.1.
>> 
>> Remember, 4.2 can come shortly after 4.1.
>> 
>> Next I will be pushing any 4.1 issues that have not been updated in a couple 
>> months.
>> 
>> - Mark
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
>> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update

2013-01-10 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-4294:
---

Assignee: Shalin Shekhar Mangar

> Solr 4 atomic update incorrect value when setting two or more values to a 
> multivalue via XML update
> ---
>
> Key: SOLR-4294
> URL: https://issues.apache.org/jira/browse/SOLR-4294
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, update
>Affects Versions: 4.0
> Environment: RHEL
>Reporter: Ben Pennell
>Assignee: Shalin Shekhar Mangar
>Priority: Blocker
> Fix For: 4.0.1, 4.1
>
>
> Setting multiple values to a multivalued field via an XML atomic update 
> request is resulting in what appears to be the output of a toString() method. 
>  See the examples below.
> I ran into this issue using the output for atomic updates from the fix for 
> Solr-4133 to ClientUtils.  The server being used is the base 4.0.0 release.
> {code}
> curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' 
> -d '
> 
> test
> one
> two
> '
> {code}
> Yields the following in Solr:
> {code}
>   {set=one}{set=two}
> {code}
> Changing the second "set" to an "add" has the same effect.
>   If I only set one value though, it works correctly:
> {code}
> 
> test
> one
> 
> {code}
>   Yields:
> {code}
> one
> {code}
>   It also works fine if I split it into two operations
> {code}
> 
> test
> one
> 
> 
> test
> two
> 
> {code}
>   Yields:
> {code}
> onetwo
> {code}
>   Oddly, it works fine as a singe request in JSON:
> {code}
> curl -k 'http://localhost/solr/update?commit=true' -H 
> 'Content-type:application/json' -d '["id":"test", {"status":{"set":["one", 
> "two"]}}]'
> {code}
>   Yields:
> {code}
> onetwo
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update

2013-01-10 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-4294:
---

Priority: Blocker  (was: Minor)

> Solr 4 atomic update incorrect value when setting two or more values to a 
> multivalue via XML update
> ---
>
> Key: SOLR-4294
> URL: https://issues.apache.org/jira/browse/SOLR-4294
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, update
>Affects Versions: 4.0
> Environment: RHEL
>Reporter: Ben Pennell
>Priority: Blocker
> Fix For: 4.0.1, 4.1
>
>
> Setting multiple values to a multivalued field via an XML atomic update 
> request is resulting in what appears to be the output of a toString() method. 
>  See the examples below.
> I ran into this issue using the output for atomic updates from the fix for 
> Solr-4133 to ClientUtils.  The server being used is the base 4.0.0 release.
> {code}
> curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' 
> -d '
> 
> test
> one
> two
> '
> {code}
> Yields the following in Solr:
> {code}
>   {set=one}{set=two}
> {code}
> Changing the second "set" to an "add" has the same effect.
>   If I only set one value though, it works correctly:
> {code}
> 
> test
> one
> 
> {code}
>   Yields:
> {code}
> one
> {code}
>   It also works fine if I split it into two operations
> {code}
> 
> test
> one
> 
> 
> test
> two
> 
> {code}
>   Yields:
> {code}
> onetwo
> {code}
>   Oddly, it works fine as a singe request in JSON:
> {code}
> curl -k 'http://localhost/solr/update?commit=true' -H 
> 'Content-type:application/json' -d '["id":"test", {"status":{"set":["one", 
> "two"]}}]'
> {code}
>   Yields:
> {code}
> onetwo
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4286) Atomic Updates on multi-valued fields giving unexpected results

2013-01-10 Thread Erik Hatcher (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erik Hatcher updated SOLR-4286:
---

Priority: Blocker  (was: Major)

> Atomic Updates on multi-valued fields giving unexpected results
> ---
>
> Key: SOLR-4286
> URL: https://issues.apache.org/jira/browse/SOLR-4286
> Project: Solr
>  Issue Type: Bug
>  Components: update
>Affects Versions: 4.0
> Environment: Windows 7 64-bit
>Reporter: Abhinav Shah
>Assignee: Shalin Shekhar Mangar
>Priority: Blocker
>
> I am using apache-solr 4.0.
> I am trying to post the following document - 
> {code}
> curl http://irvis016:8983/solr/collection1/update?commit=true -H 
> "Content-Type: text/xml" --data-binary ' boost="1.0">3165297 name="status" update="set">ORDERED update="set">US LABS DEMO ACCOUNT name="account.addresses.address1" update="set">2601 Campus 
> Drive update="set">Irvine update="set">CA update="set">92622 update="set">10442 update="set">60086 update="set">5571351625769103 name="patient.patientName.lastName" update="set">test name="patient.patientName.firstName" update="set">test123 name="patient.patientSSN" update="set">643522342 name="patient.patientDOB" update="set">1979-11-11T08:00:00.000Z name="patient.mrNs.mrn" update="set">5423 name="specimens.specimenType" update="set">Bone Marrow name="specimens.specimenType" update="set">Nerve tissue name="UID">3165297USLABS2012'
> {code}
> This document gets successfully posted. However, the multi-valued field 
> 'specimens.specimenType', gets stored as following in SOLR -
> {code}
> 
> {set=Bone Marrow}
> {set=Nerve tissue}
> 
> {code}
> I did not expect "{set=" to be stored along with the text "Bone Marror".
> My Solr schema xml definition for the field specimens.SpecimenType is - 
> {code}
>  omitNorms="false" omitPositions="true" omitTermFreqAndPositions="true" 
> stored="true" termVectors="false" type="text_en"/>
> {code}
> Can someone help?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4620:
---

Attachment: LUCENE-4620.patch

bq. Can we use Collections.singletonMap when there are no partitions?

Done. Note though that BytesRef cannot be reused in the case of 
PerDimensionIndexingParams (i.e. multiple CLPs). This is not the common case, 
but it's not trivial to specialize it. Maybe as a second iteration. I did put a 
TODO in FacetFields to allow reuse.

bq. why do we have VInt8.bytesNeeded? Who uses that?

Currently no one uses it, but it was there and I thought that it's a convenient 
API to keep. Why encode and then see how many bytes were occupied?
Anyway, neither the encoders nor the decoders use it. I have no strong feelings 
for keeping/removing it, so if you feel like it should be removed, I can do it.

bq. Hmm, it's a little abusive how VInt8.decode changes the offset of the 
incoming BytesRef

It is, but that's the result of Java's lack of pass by reference. I.e., decode 
needs to return the caller two values: the decoded number and how many bytes 
were read.
Notice that in the previous byte[] variant, the method took a class Position, 
which is horrible. That's why I documented in decode() that it advances 
bytes.offset, so
the caller can restore it in the end. For instance, IntDecoder restores the 
offset to the original one in the end.

On LUCENE-4675 Robert gave me an idea to create a BytesRefIterator, and I 
started to play with it. I.e. it would wrap a BytesRef but add 'pos' and 'upto' 
indexes.
The user can modify 'pos' freely, withouth touching bytes.offset. That 
introduces an object allocation though, and since I'd want to reuse that object 
wherever
possible, I think I'll look at it after finishing this issue. It already 
contains too many changes.

bq. I guess this is why you want an upto

No, I wanted upto because iterating up to bytes.length is incorrect. You need 
to iterate up to offset+length. BytesRefIterator.pos and BytesRefIterator.upto 
solve these cases for me.

bq. looks like things got a bit slower (or possibly it's noise)

First, even if it's not noise, the slowdown IMO is worth the code 
simplification. But, I do believe that we'll see gains when there are more than 
3 integers to encode/decode.
In fact, the facets test package has an EncodingSpeed class which measures the 
time it takes to encode/decode a large number of integers (a few thousands). 
When I compared the
result to 4x (i.e. without the patch), the decode time seemed to be ~x5 faster.

In this patch I added an Ant task "run-encoding-benchmark" which runs this 
class. Want to give it a try on your beast machine? For 4x, you can just copy 
the target to lucene/facet/build.xml, I believe it will work without issues.

> Explore IntEncoder/Decoder bulk API
> ---
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-4620.patch, LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 4.1 release

2013-01-10 Thread Steve Rowe

As of now, there are two Blocker issues in JIRA with Fix Version 4.1: 

Dataimporting with SolrCloud Fails
https://issues.apache.org/jira/browse/SOLR-4112

modify release process/scripts to use svn for rc/release publishing 
(svnpubsub)
https://issues.apache.org/jira/browse/LUCENE-4134

(LUCENE-4431 - servlet-api.jar licensing - is listed as Blocker with Fix 
Version including 4.1, but this has been fixed in branch_4x, and was reopened 
only for 3.6.X backporting.)  

LUCENE-4547  (DocValues 2.0) 
is listed as Blocker with Fix Version including 4.2, but recent commits to 
branches/lucene4547/ include changes to the Lucene41 codec.  Looks like Fix 
Version should be changed to 4.1?

I'd like to release soon.  What else blocks this?

Steve

On Dec 31, 2012, at 2:08 PM, Mark Miller  wrote:

> I've started pushing on JIRA issue for a 4.1 release.
> 
> If something is pushed that you are going to work on in the very near term, 
> please put it back.
> 
> I'll progressively get more aggressive about pushing and count on committers 
> to fix any mistakes if they want something in 4.1.
> 
> Remember, 4.2 can come shortly after 4.1.
> 
> Next I will be pushing any 4.1 issues that have not been updated in a couple 
> months.
> 
> - Mark
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4134) modify release process/scripts to use svn for rc/release publishing (svnpubsub)

2013-01-10 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-4134:
---

Fix Version/s: 4.1

> modify release process/scripts to use svn for rc/release publishing 
> (svnpubsub)
> ---
>
> Key: LUCENE-4134
> URL: https://issues.apache.org/jira/browse/LUCENE-4134
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Hoss Man
>Priority: Blocker
> Fix For: 4.1
>
>
> By the end of 2012, all of www.apache.org *INCLUDING THE DIST DIR* must be 
> entirely managed using "svnpubsub" ... our use of the Apache CMS for 
> lucene.apache.org puts us in compliance for our main website, but the dist 
> dir use for publishing release artifacts also needs to be manaved via svn.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549930#comment-13549930
 ] 

Robert Muir commented on LUCENE-4431:
-

I did those automatically (when jira releases, it asks you if you want to move 
out any still-open issues... never saw it before, its handy though).

but  yeah we should still fix this if we do a 3.6.3 IMO

> License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
> NOTICE.txt
> -
>
> Key: LUCENE-4431
> URL: https://issues.apache.org/jira/browse/LUCENE-4431
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0, 4.1, 5.0, 3.6.3
>
> Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch
>
>
> - The demo module has sevlet-api.jar with a ASF-named license file and the 
> text "TODO: fill in"
> - This also affects Solr: It has a full ASF license file, but that is wrong.
> The servlet-apoi file is CDDL license: 
> http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
> for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549914#comment-13549914
 ] 

Steve Rowe commented on LUCENE-4431:


ah right, fix version is 3.6.3

> License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
> NOTICE.txt
> -
>
> Key: LUCENE-4431
> URL: https://issues.apache.org/jira/browse/LUCENE-4431
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0, 4.1, 5.0, 3.6.3
>
> Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch
>
>
> - The demo module has sevlet-api.jar with a ASF-named license file and the 
> text "TODO: fill in"
> - This also affects Solr: It has a full ASF license file, but that is wrong.
> The servlet-apoi file is CDDL license: 
> http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
> for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549904#comment-13549904
 ] 

Robert Muir commented on LUCENE-4431:
-

No, because it wasnt fixed in 3.6.2

> License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
> NOTICE.txt
> -
>
> Key: LUCENE-4431
> URL: https://issues.apache.org/jira/browse/LUCENE-4431
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0, 4.1, 5.0, 3.6.3
>
> Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch
>
>
> - The demo module has sevlet-api.jar with a ASF-named license file and the 
> text "TODO: fill in"
> - This also affects Solr: It has a full ASF license file, but that is wrong.
> The servlet-apoi file is CDDL license: 
> http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
> for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4431) License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add NOTICE.txt

2013-01-10 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549894#comment-13549894
 ] 

Steve Rowe commented on LUCENE-4431:


Can this be resolved now, since 3.6.2 was released?

> License of servlet-api.jar is NOT ASF, it is CDDL! We must fix and add 
> NOTICE.txt
> -
>
> Key: LUCENE-4431
> URL: https://issues.apache.org/jira/browse/LUCENE-4431
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/other
>Affects Versions: 3.6.1, 4.0-BETA
>Reporter: Uwe Schindler
>Assignee: Uwe Schindler
>Priority: Blocker
> Fix For: 4.0, 4.1, 5.0, 3.6.3
>
> Attachments: LUCENE-4431.patch, LUCENE-4431.patch, LUCENE-4431.patch
>
>
> - The demo module has sevlet-api.jar with a ASF-named license file and the 
> text "TODO: fill in"
> - This also affects Solr: It has a full ASF license file, but that is wrong.
> The servlet-apoi file is CDDL license: 
> http://download.oracle.com/otndocs/jcp/servlet-3.0-fr-eval-oth-JSpec/ (same 
> for 2.4). The 3.0.1 JAR file also contains License in its META-INF folder.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4295) SolrQuery setFacet() and getFacet() should have versions that specify the field

2013-01-10 Thread Colin Bartolome (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin Bartolome updated SOLR-4295:
--

Description: 
Since the parameter names for field-specific faceting parameters are a little 
odd (and undocumented), such as "f.field_name.facet.prefix", the SolrQuery 
class should have methods that take a "field" parameter. The 
SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
great if the rest of the setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.

Also, as far as I can tell, there isn't a constant for the "f." prefix. That 
would be helpful, too.

  was:
Since the parameter names for field-specific faceting parameters are a little 
odd (and undocumented), such as "f.field_name.facet.prefix", the SolrQuery 
class should have methods that take a "field" parameter. The 
SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
great if the rest of the setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.


> SolrQuery setFacet*() and getFacet*() should have versions that specify the 
> field
> -
>
> Key: SOLR-4295
> URL: https://issues.apache.org/jira/browse/SOLR-4295
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 4.0
>Reporter: Colin Bartolome
>Priority: Minor
>
> Since the parameter names for field-specific faceting parameters are a little 
> odd (and undocumented), such as "f.field_name.facet.prefix", the SolrQuery 
> class should have methods that take a "field" parameter. The 
> SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
> great if the rest of the setFacet*() and getFacet*() methods did, too.
> The workaround is trivial, albeit clumsy: just create the parameter names by 
> hand, as necessary.
> Also, as far as I can tell, there isn't a constant for the "f." prefix. That 
> would be helpful, too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4295) SolrQuery setFacet() and getFacet() should have versions that specify the field

2013-01-10 Thread Colin Bartolome (JIRA)

Colin Bartolome created SOLR-4295:
-

 Summary: SolrQuery setFacet*() and getFacet*() should have 
versions that specify the field
 Key: SOLR-4295
 URL: https://issues.apache.org/jira/browse/SOLR-4295
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 4.0
Reporter: Colin Bartolome
Priority: Minor


Since the parameter names for field-specific faceting parameters are a little 
odd (and undocumented), such as "f.field_name.facet.prefix", the SolrQuery 
class should have methods that take a "field" parameter. The 
SolrQuery.setFacetPrefix() method already takes such a parameter. It would be 
great if the rest of the setFacet*() and getFacet*() methods did, too.

The workaround is trivial, albeit clumsy: just create the parameter names by 
hand, as necessary.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549759#comment-13549759
 ] 

Michael McCandless commented on LUCENE-4620:


Thanks Shai, that new patch worked!

This patch looks great!

It's a little disturbing that every doc must make a new
HashMap at indexing time (seems like a lot of
overhead/objects when the common case just needs to return a single
BytesRef, which could be re-used).  Can we use
Collections.singletonMap when there are no partitions?

The decode API (more important than encode) looks like it reuses the
Bytes/IntsRef, so that's good.

Hmm why do we have VInt8.bytesNeeded?  Who uses that?  I think that's
a dangerous API to have  it's better to simply encode and then see
how many bytes it took.

Hmm, it's a little abusive how VInt8.decode changes the offset of the
incoming BytesRef ... I guess this is why you want an upto :)

Net/net this is great progress over what we have today, so +1!

I ran a quick 10M English Wikipedia test w/ just term queries:
{noformat}
TaskQPS base  StdDevQPS comp  StdDevPct diff
   HighTerm   12.79  (2.4%)   12.56  (1.2%)   -1.8% 
(  -5% -1%)
MedTerm   18.04  (1.8%)   17.77  (0.8%)   -1.5% 
(  -4% -1%)
LowTerm   47.69  (1.1%)   47.56  (1.0%)   -0.3% 
(  -2% -1%)
{noformat}

The test only has 3 ords per doc so it's not "typical" ... looks like things 
got a bit slower (or possibly it's noise).

> Explore IntEncoder/Decoder bulk API
> ---
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4294) Solr 4 atomic update incorrect value when setting two or more values to a multivalue via XML update

2013-01-10 Thread Ben Pennell (JIRA)

Ben Pennell created SOLR-4294:
-

 Summary: Solr 4 atomic update incorrect value when setting two or 
more values to a multivalue via XML update
 Key: SOLR-4294
 URL: https://issues.apache.org/jira/browse/SOLR-4294
 Project: Solr
  Issue Type: Bug
  Components: clients - java, update
Affects Versions: 4.0
 Environment: RHEL
Reporter: Ben Pennell
Priority: Minor
 Fix For: 4.0.1, 4.1


Setting multiple values to a multivalued field via an XML atomic update request 
is resulting in what appears to be the output of a toString() method.  See the 
examples below.

I ran into this issue using the output for atomic updates from the fix for 
Solr-4133 to ClientUtils.  The server being used is the base 4.0.0 release.

{code}
curl 'https://localhost/solr/update?commit=true' -H 'Content-type:text/xml' -d '

test
one
two
'
{code}
Yields the following in Solr:
{code}
  {set=one}{set=two}
{code}

Changing the second "set" to an "add" has the same effect.

  If I only set one value though, it works correctly:
{code}

test
one

{code}
  Yields:
{code}
one
{code}

  It also works fine if I split it into two operations
{code}

test
one


test
two

{code}
  Yields:
{code}
onetwo
{code}

  Oddly, it works fine as a singe request in JSON:
{code}
curl -k 'http://localhost/solr/update?commit=true' -H 
'Content-type:application/json' -d '["id":"test", {"status":{"set":["one", 
"two"]}}]'
{code}
  Yields:
{code}
onetwo
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config

2013-01-10 Thread JIRA


 [ 
https://issues.apache.org/jira/browse/SOLR-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yago Riveiro Rodríguez closed SOLR-4292.


Resolution: Not A Problem

> After upload and link config collection, the collection in solrcloud not load 
> the new config
> 
>
> Key: SOLR-4292
> URL: https://issues.apache.org/jira/browse/SOLR-4292
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: CentOS release 6.3 (Final)
> Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Yago Riveiro Rodríguez
>
> I'm trying to change the settings for a specific collection, which is empty, 
> with a new config.
> The collection has 2 shards, and the zookeeper is a cluster of 3 servers.
> I used the zookeeper to upload the configuration and link it with the 
> collection. After this, I reloaded the collection in both nodes (replica and 
> leader) but when I try to see the STATUS of collection's core 
> (/solr/admin/cores?action=STATUS&wt=json&indent=true) I get this error:
>  
> "ST-4A46DF1563_0812":"org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
>  Specified config does not exist in 
> ZooKeeper:statisticsBucket-aggregation-revision-1"
> The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the 
> configname: {"configName":"statisticsBucket-aggregation-revision-1"}
> If the zookeeper has the new config loaded and I linked the config to the 
> collection, why the status of core says that the configuration is missing?
> /Yago

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config

2013-01-10 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-4292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549712#comment-13549712
 ] 

Yago Riveiro Rodríguez commented on SOLR-4292:
--

My fault, I wrote the confname parameter incorrectly. btw the zookeeper's log 
is so verbose that the error hasn't visibility.

> After upload and link config collection, the collection in solrcloud not load 
> the new config
> 
>
> Key: SOLR-4292
> URL: https://issues.apache.org/jira/browse/SOLR-4292
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: CentOS release 6.3 (Final)
> Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 
> 2012 x86_64 x86_64 x86_64 GNU/Linux
>Reporter: Yago Riveiro Rodríguez
>
> I'm trying to change the settings for a specific collection, which is empty, 
> with a new config.
> The collection has 2 shards, and the zookeeper is a cluster of 3 servers.
> I used the zookeeper to upload the configuration and link it with the 
> collection. After this, I reloaded the collection in both nodes (replica and 
> leader) but when I try to see the STATUS of collection's core 
> (/solr/admin/cores?action=STATUS&wt=json&indent=true) I get this error:
>  
> "ST-4A46DF1563_0812":"org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
>  Specified config does not exist in 
> ZooKeeper:statisticsBucket-aggregation-revision-1"
> The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the 
> configname: {"configName":"statisticsBucket-aggregation-revision-1"}
> If the zookeeper has the new config loaded and I linked the config to the 
> collection, why the status of core says that the configuration is missing?
> /Yago

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4293) Solr throws an NPE when extracting update handled called with an empty document

2013-01-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-4293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated SOLR-4293:
--

Attachment: SOLR-4293.patch

This patch should fix the problem.

> Solr throws an NPE when extracting update handled called with an empty 
> document
> ---
>
> Key: SOLR-4293
> URL: https://issues.apache.org/jira/browse/SOLR-4293
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 4.0
>Reporter: Karl Wright
> Attachments: SOLR-4293.patch
>
>
> When you send an empty document to update/extract, you get this:
> {code}
> SEVERE: java.lang.NullPointerException
>   at 
> org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:164)
>   at 
> org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244)
>   at 
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
>   at 
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
>   at 
> org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
>   at 
> org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
>   at 
> org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
>   at 
> org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541)
>   at 
> org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
>   at 
> org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383)
>   at 
> org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
>   at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
>   at 
> org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166)
>   at 
> org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>   at java.lang.Thread.run(Thread.java:722)
> {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4293) Solr throws an NPE when extracting update handled called with an empty document

2013-01-10 Thread Karl Wright (JIRA)

Karl Wright created SOLR-4293:
-

 Summary: Solr throws an NPE when extracting update handled called 
with an empty document
 Key: SOLR-4293
 URL: https://issues.apache.org/jira/browse/SOLR-4293
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Karl Wright


When you send an empty document to update/extract, you get this:

{code}
SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.extraction.SolrContentHandler.addLiterals(SolrContentHandler.java:164)
at 
org.apache.solr.handler.extraction.SolrContentHandler.newDocument(SolrContentHandler.java:115)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:120)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:126)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:455)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:276)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:244)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:240)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
at 
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:541)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:383)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:243)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:188)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:166)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:288)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
{code}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-4292) After upload and link config collection, the collection in solrcloud not load the new config

2013-01-10 Thread JIRA

Yago Riveiro Rodríguez created SOLR-4292:


 Summary: After upload and link config collection, the collection 
in solrcloud not load the new config
 Key: SOLR-4292
 URL: https://issues.apache.org/jira/browse/SOLR-4292
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: CentOS release 6.3 (Final)

Linux app-solr-00 2.6.32-279.14.1.el6.x86_64 #1 SMP Tue Nov 6 23:43:09 UTC 2012 
x86_64 x86_64 x86_64 GNU/Linux
Reporter: Yago Riveiro Rodríguez


I'm trying to change the settings for a specific collection, which is empty, 
with a new config.

The collection has 2 shards, and the zookeeper is a cluster of 3 servers.

I used the zookeeper to upload the configuration and link it with the 
collection. After this, I reloaded the collection in both nodes (replica and 
leader) but when I try to see the STATUS of collection's core 
(/solr/admin/cores?action=STATUS&wt=json&indent=true) I get this error:
 
"ST-4A46DF1563_0812":"org.apache.solr.common.cloud.ZooKeeperException:org.apache.solr.common.cloud.ZooKeeperException:
 Specified config does not exist in 
ZooKeeper:statisticsBucket-aggregation-revision-1"

The clusterstate.json shows that the ST-4A46DF1563_0812 has loaded the 
configname: {"configName":"statisticsBucket-aggregation-revision-1"}

If the zookeeper has the new config loaded and I linked the config to the 
collection, why the status of core says that the configuration is missing?

/Yago

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549654#comment-13549654
 ] 

Uwe Schindler commented on LUCENE-4675:
---

Strong +1 to make BytesRef a byte[] reference only. BytesRef is unfortunately a 
user-facing class in Lucene 4.x, so we have to look into this. I was also 
planning to fix this before 4.0, but we had no time. This was one of the last 
classes, Robert and I did not fix in the final cleanup before release, which is 
a pity.

> remove *Ref.copy/append/grow
> 
>
> Key: LUCENE-4675
> URL: https://issues.apache.org/jira/browse/LUCENE-4675
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> These methods are dangerous:
> In general if we want a StringBuilder type class, then it should own the 
> array, and it can freely do allocation stuff etc. this is the only way to 
> make it safe.
> Otherwise if we want a ByteBuffer type class, then its reference should be 
> immutable (the byte[]/offset/length should be final), and it should not have 
> allocation stuff.
> BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
> these unsafe, dangerous, trappy APIs directly in front of the user.
> What happens if i have a bug in my application and it accidentally mucks with 
> the term bytes returned by TermsEnum or the payloads from 
> DocsAndPositionsEnum? Will this get merged into a corrupt index?
> I think as a start we should remove these copy/append/grow to minimize this 
> closer to a ref class (e.g. more like java.lang.ref and less like 
> stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
> on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3354) Extend FieldCache architecture to multiple Values

2013-01-10 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549651#comment-13549651
 ] 

Varun Thacker commented on LUCENE-3354:
---

Hi,

I have a doubt on FieldCache supporting MultiValued fields in general. So 
FieldCache on a multiValued field works by consuming it from 
FieldCache.DocTermOrds but,

* I was trying out FunctionQuery in Solr and still got a "cannot FieldCache on 
multiValued field" error. This is because any impl. of FieldCacheSource for 
example StrFieldSource#getValues() returns DocTermsIndexDocValues where 
FieldCache.DocTermsIndex instance loads up. Is this supposed to be consumed 
like this? 

* Secondly slightly off topic but I went through the lucene4547 branch where 
there was a discussion on how to consume DocValues. I'm still trying to figure 
a lot of stuff around DocValues, FieldCache etc. but do we need to discuss all 
these issues and it's impact on Solr and ES as a whole?

> Extend FieldCache architecture to multiple Values
> -
>
> Key: LUCENE-3354
> URL: https://issues.apache.org/jira/browse/LUCENE-3354
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Bill Bell
> Fix For: 4.0-ALPHA
>
> Attachments: LUCENE-3354.patch, LUCENE-3354.patch, 
> LUCENE-3354_testspeed.patch
>
>
> I would consider this a bug. It appears lots of people are working around 
> this limitation, 
> why don't we just change the underlying data structures to natively support 
> multiValued fields in the FieldCache architecture?
> Then functions() will work properly, and we can do things like easily 
> geodist() on a multiValued field.
> Thoughts?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.6.0) - Build # 70 - Failure!

2013-01-10 Thread Robert Muir

JVM Crash:

[junit4:junit4] Suite: org.apache.solr.cloud.FullSolrCloudDistribCmdsTest
[junit4:junit4] Completed in 32.12s, 1 test
[junit4:junit4]
[junit4:junit4] JVM J0: stdout was not empty, see:
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20130110_132632_493.sysout
[junit4:junit4] >>> JVM J0: stdout (verbatim) 
[junit4:junit4] Invalid memory access of location 0x0 rip=0x7fff8f93db43
[junit4:junit4] <<< JVM J0: EOF 
[junit4:junit4] Execution time total: 18 minutes 36 seconds


On Thu, Jan 10, 2013 at 8:45 AM, Policeman Jenkins Server
 wrote:
> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/70/
> Java: 64bit/jdk1.6.0 -XX:+UseSerialGC
>
> All tests passed
>
> Build Log:
> [...truncated 8383 lines...]
> [junit4:junit4] ERROR: JVM J0 ended with an exception, command line: 
> /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java 
> -XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError 
> -XX:HeapDumpPath=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/heapdumps
>  -Dtests.prefix=tests -Dtests.seed=BC09482A7937D842 -Xmx512M -Dtests.iters= 
> -Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
> -Dtests.postingsformat=random -Dtests.locale=random -Dtests.timezone=random 
> -Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz 
> -Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass 
> -Djava.util.logging.config.file=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/testlogging.properties
>  -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
> -Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
> -Djava.io.tmpdir=. 
> -Djunit4.tempDir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
>  
> -Dclover.db.dir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
>  -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
> -Djava.security.policy=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
>  -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
> -Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
> -Djava.awt.headless=true -Dfile.encoding=ISO-8859-1 -classpath 
> /Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/cglib-nodep-2.2.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-fileupload-1.2.1.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-lang-2.6.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.6.0) - Build # 70 - Failure!

2013-01-10 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/70/
Java: 64bit/jdk1.6.0 -XX:+UseSerialGC

All tests passed

Build Log:
[...truncated 8383 lines...]
[junit4:junit4] ERROR: JVM J0 ended with an exception, command line: 
/System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents/Home/bin/java 
-XX:+UseSerialGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/heapdumps
 -Dtests.prefix=tests -Dtests.seed=BC09482A7937D842 -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.locale=random -Dtests.timezone=random 
-Dtests.directory=random -Dtests.linedocsfile=europarl.lines.txt.gz 
-Dtests.luceneMatchVersion=5.0 -Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/testlogging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.slow=true 
-Dtests.asserts.gracious=false -Dtests.multiplier=1 -DtempDir=. 
-Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Dfile.encoding=ISO-8859-1 -classpath 
/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test-files:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/cglib-nodep-2.2.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.7.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-fileupload-1.2.1.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-lang-2.6.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/easymock-3.0.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/guava-13.0.1.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/javax.servlet-api-3.0.1.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/objenesis-1.2.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/spatial4j-0.3.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/solrj/lib/commons-io-2.1.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/solr/solrj/lib/httpclient-4.1.3.jar:/Users/jenkins/jenkins-slave/workspace/Lucene-Solr-trunk-MacOSX/s

[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549637#comment-13549637
 ] 

Robert Muir commented on LUCENE-4675:
-

I dont think we should add more functionality to these *Ref classes: they have 
too many traps and bugs already.

Less is more here.

> remove *Ref.copy/append/grow
> 
>
> Key: LUCENE-4675
> URL: https://issues.apache.org/jira/browse/LUCENE-4675
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> These methods are dangerous:
> In general if we want a StringBuilder type class, then it should own the 
> array, and it can freely do allocation stuff etc. this is the only way to 
> make it safe.
> Otherwise if we want a ByteBuffer type class, then its reference should be 
> immutable (the byte[]/offset/length should be final), and it should not have 
> allocation stuff.
> BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
> these unsafe, dangerous, trappy APIs directly in front of the user.
> What happens if i have a bug in my application and it accidentally mucks with 
> the term bytes returned by TermsEnum or the payloads from 
> DocsAndPositionsEnum? Will this get merged into a corrupt index?
> I think as a start we should remove these copy/append/grow to minimize this 
> closer to a ref class (e.g. more like java.lang.ref and less like 
> stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
> on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549632#comment-13549632
 ] 

Shai Erera commented on LUCENE-4675:


bq. you can separately make your own BytesRefIterator class

I can. I wanted to avoid additional object allocations, but such an Iterator 
class can have a reset(BytesRef) method which will update pos and upto members 
accordingly. I was thinking that an 'upto' index might be useful for others. 
For my purposes (see LUCENE-4620) I just use bytes.offset as 'pos' and compute 
an 'upto' and passes it along. I will think about the Iterator class though, 
perhaps it's not a bad idea. And maybe *Ref can have an iterator() method which 
returns the proper one ... or not.

> remove *Ref.copy/append/grow
> 
>
> Key: LUCENE-4675
> URL: https://issues.apache.org/jira/browse/LUCENE-4675
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> These methods are dangerous:
> In general if we want a StringBuilder type class, then it should own the 
> array, and it can freely do allocation stuff etc. this is the only way to 
> make it safe.
> Otherwise if we want a ByteBuffer type class, then its reference should be 
> immutable (the byte[]/offset/length should be final), and it should not have 
> allocation stuff.
> BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
> these unsafe, dangerous, trappy APIs directly in front of the user.
> What happens if i have a bug in my application and it accidentally mucks with 
> the term bytes returned by TermsEnum or the payloads from 
> DocsAndPositionsEnum? Will this get merged into a corrupt index?
> I think as a start we should remove these copy/append/grow to minimize this 
> closer to a ref class (e.g. more like java.lang.ref and less like 
> stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
> on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549630#comment-13549630
 ] 

Robert Muir commented on LUCENE-4674:
-

{quote}
"allocating a new byte[] if someOtherStuff offset + length > this.offset + 
length?" ?
{quote}

This, preventing a.copy(otherStuff) from overflowing onto b.

I dont want any other functionality in this class. it needs less, not more.

{quote}
Regarding the idea to switch to the java.nio buffers, are there some traps 
besides backward compatibility? Should we start migrating our internal APIs to 
this API (and maybe even the public ones for 5.0?).
{quote}

I haven't even thought about it really. I actually am less concerned about our 
internal apis. 

Its the public ones i care about.

I would care a lot less about BytesRef & co if users werent forced to interact 
with them.

> Consistently set offset=0 in BytesRef.copyBytes
> ---
>
> Key: LUCENE-4674
> URL: https://issues.apache.org/jira/browse/LUCENE-4674
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-4674.patch
>
>
> BytesRef.copyBytes(BytesRef other) has two branches:
>  - either the destination array is large enough and it will copy bytes after 
> offset,
>  - or it needs to resize and in that case it will set offset = 0.
> I think this method should always set offset = 0 for consistency, and to 
> avoid resizing when other.length is larger than this.bytes.length - 
> this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549625#comment-13549625
 ] 

Adrien Grand commented on LUCENE-4674:
--

bq. I still like the idea of fixing this myself (maybe Shai's idea?). i don't 
like this kind of dangerous stuff!!

The 'upto' idea or "allocating a new byte[] if someOtherStuff offset + length > 
this.offset + length?" ?

bq. I ultimately think LUCENE-4675 is the next logical step, but can we remove 
this a.copy()-overwrites-b trap as an incremental improvement?

Regarding the idea to switch to the java.nio buffers, are there some traps 
besides backward compatibility? Should we start migrating our internal APIs to 
this API (and maybe even the public ones for 5.0?).

> Consistently set offset=0 in BytesRef.copyBytes
> ---
>
> Key: LUCENE-4674
> URL: https://issues.apache.org/jira/browse/LUCENE-4674
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-4674.patch
>
>
> BytesRef.copyBytes(BytesRef other) has two branches:
>  - either the destination array is large enough and it will copy bytes after 
> offset,
>  - or it needs to resize and in that case it will set offset = 0.
> I think this method should always set offset = 0 for consistency, and to 
> avoid resizing when other.length is larger than this.bytes.length - 
> this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549611#comment-13549611
 ] 

Robert Muir commented on LUCENE-4675:
-

i dont think we need any additional members in this thing. what more does it 
need other than byte[], offset, length?!

i want to remove the extraneous stuff. if you want to make an iterator, you can 
separately make your own BytesRefIterator class?

> remove *Ref.copy/append/grow
> 
>
> Key: LUCENE-4675
> URL: https://issues.apache.org/jira/browse/LUCENE-4675
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> These methods are dangerous:
> In general if we want a StringBuilder type class, then it should own the 
> array, and it can freely do allocation stuff etc. this is the only way to 
> make it safe.
> Otherwise if we want a ByteBuffer type class, then its reference should be 
> immutable (the byte[]/offset/length should be final), and it should not have 
> allocation stuff.
> BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
> these unsafe, dangerous, trappy APIs directly in front of the user.
> What happens if i have a bug in my application and it accidentally mucks with 
> the term bytes returned by TermsEnum or the payloads from 
> DocsAndPositionsEnum? Will this get merged into a corrupt index?
> I think as a start we should remove these copy/append/grow to minimize this 
> closer to a ref class (e.g. more like java.lang.ref and less like 
> stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
> on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549598#comment-13549598
 ] 

Shai Erera commented on LUCENE-4675:


ok. While you're at it, what do you think about adding an 'upto' member for 
easier iteration on the bytes/ints/chars? (see my comment on LUCENE-4674)

> remove *Ref.copy/append/grow
> 
>
> Key: LUCENE-4675
> URL: https://issues.apache.org/jira/browse/LUCENE-4675
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> These methods are dangerous:
> In general if we want a StringBuilder type class, then it should own the 
> array, and it can freely do allocation stuff etc. this is the only way to 
> make it safe.
> Otherwise if we want a ByteBuffer type class, then its reference should be 
> immutable (the byte[]/offset/length should be final), and it should not have 
> allocation stuff.
> BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
> these unsafe, dangerous, trappy APIs directly in front of the user.
> What happens if i have a bug in my application and it accidentally mucks with 
> the term bytes returned by TermsEnum or the payloads from 
> DocsAndPositionsEnum? Will this get merged into a corrupt index?
> I think as a start we should remove these copy/append/grow to minimize this 
> closer to a ref class (e.g. more like java.lang.ref and less like 
> stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
> on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549596#comment-13549596
 ] 

Robert Muir commented on LUCENE-4674:
-

{quote}
Unfortunately a.copy(otherStuff) will modify b if otherStuff.length > 5.
{quote}

I still like the idea of fixing this myself (maybe Shai's idea?). i don't like 
this kind of dangerous stuff!!

I ultimately think LUCENE-4675 is the next logical step, but can we remove this 
a.copy()-overwrites-b trap as an incremental improvement?

thats a bug in my opinion.

> Consistently set offset=0 in BytesRef.copyBytes
> ---
>
> Key: LUCENE-4674
> URL: https://issues.apache.org/jira/browse/LUCENE-4674
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-4674.patch
>
>
> BytesRef.copyBytes(BytesRef other) has two branches:
>  - either the destination array is large enough and it will copy bytes after 
> offset,
>  - or it needs to resize and in that case it will set offset = 0.
> I think this method should always set offset = 0 for consistency, and to 
> avoid resizing when other.length is larger than this.bytes.length - 
> this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-4674.
--

Resolution: Won't Fix

> Consistently set offset=0 in BytesRef.copyBytes
> ---
>
> Key: LUCENE-4674
> URL: https://issues.apache.org/jira/browse/LUCENE-4674
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-4674.patch
>
>
> BytesRef.copyBytes(BytesRef other) has two branches:
>  - either the destination array is large enough and it will copy bytes after 
> offset,
>  - or it needs to resize and in that case it will set offset = 0.
> I think this method should always set offset = 0 for consistency, and to 
> avoid resizing when other.length is larger than this.bytes.length - 
> this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549590#comment-13549590
 ] 

Robert Muir commented on LUCENE-4675:
-

I'm proposing removing these 3 methods from BytesRef itself, thats all.

The guy from the outside knows what he can do: he knows if the bytes actually 
point to a slice of a PagedBytes
(grow is actually senseless here!), or just a simple byte[], or whatever. He 
doesn't need BytesRef itself to do these things.

So he can then change the ref to point at a different slice, or different 
byte[] alltogether, or whatever.

> remove *Ref.copy/append/grow
> 
>
> Key: LUCENE-4675
> URL: https://issues.apache.org/jira/browse/LUCENE-4675
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> These methods are dangerous:
> In general if we want a StringBuilder type class, then it should own the 
> array, and it can freely do allocation stuff etc. this is the only way to 
> make it safe.
> Otherwise if we want a ByteBuffer type class, then its reference should be 
> immutable (the byte[]/offset/length should be final), and it should not have 
> allocation stuff.
> BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
> these unsafe, dangerous, trappy APIs directly in front of the user.
> What happens if i have a bug in my application and it accidentally mucks with 
> the term bytes returned by TermsEnum or the payloads from 
> DocsAndPositionsEnum? Will this get merged into a corrupt index?
> I think as a start we should remove these copy/append/grow to minimize this 
> closer to a ref class (e.g. more like java.lang.ref and less like 
> stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
> on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549589#comment-13549589
 ] 

Shai Erera commented on LUCENE-4675:


I kinda like grow(). Will I be able to grow() the buffer from the outside if 
you remove it? I.e. will the byte[] not be final?

> remove *Ref.copy/append/grow
> 
>
> Key: LUCENE-4675
> URL: https://issues.apache.org/jira/browse/LUCENE-4675
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>
> These methods are dangerous:
> In general if we want a StringBuilder type class, then it should own the 
> array, and it can freely do allocation stuff etc. this is the only way to 
> make it safe.
> Otherwise if we want a ByteBuffer type class, then its reference should be 
> immutable (the byte[]/offset/length should be final), and it should not have 
> allocation stuff.
> BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
> these unsafe, dangerous, trappy APIs directly in front of the user.
> What happens if i have a bug in my application and it accidentally mucks with 
> the term bytes returned by TermsEnum or the payloads from 
> DocsAndPositionsEnum? Will this get merged into a corrupt index?
> I think as a start we should remove these copy/append/grow to minimize this 
> closer to a ref class (e.g. more like java.lang.ref and less like 
> stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
> on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4676) IndexReader.isCurrent race

2013-01-10 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-4676:
---

 Summary: IndexReader.isCurrent race
 Key: LUCENE-4676
 URL: https://issues.apache.org/jira/browse/LUCENE-4676
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


Revision: 1431169

ant test  -Dtestcase=TestNRTManager 
-Dtests.method=testThreadStarvationNoDeleteNRTReader 
-Dtests.seed=925ECD106FBFA3FF -Dtests.slow=true -Dtests.locale=fr_CA 
-Dtests.timezone=America/Kentucky/Louisville -Dtests.file.encoding=US-ASCII 
-Dtests.dups=500

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-4675) remove *Ref.copy/append/grow

2013-01-10 Thread Robert Muir (JIRA)

Robert Muir created LUCENE-4675:
---

 Summary: remove *Ref.copy/append/grow
 Key: LUCENE-4675
 URL: https://issues.apache.org/jira/browse/LUCENE-4675
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir


These methods are dangerous:

In general if we want a StringBuilder type class, then it should own the array, 
and it can freely do allocation stuff etc. this is the only way to make it safe.

Otherwise if we want a ByteBuffer type class, then its reference should be 
immutable (the byte[]/offset/length should be final), and it should not have 
allocation stuff.

BytesRef is none of these, its like a C pointer. Unfortunately lucene puts 
these unsafe, dangerous, trappy APIs directly in front of the user.

What happens if i have a bug in my application and it accidentally mucks with 
the term bytes returned by TermsEnum or the payloads from DocsAndPositionsEnum? 
Will this get merged into a corrupt index?

I think as a start we should remove these copy/append/grow to minimize this 
closer to a ref class (e.g. more like java.lang.ref and less like 
stringbuilder). Nobody needs this stuff on bytesref, they can already operate 
on the bytes directly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-4620:
---

Attachment: LUCENE-4620.patch

Sorry. Can you try now?

> Explore IntEncoder/Decoder bulk API
> ---
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-4620.patch, LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4620) Explore IntEncoder/Decoder bulk API

2013-01-10 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549568#comment-13549568
 ] 

Michael McCandless commented on LUCENE-4620:


Looks like there were some svn mv's, so the patch doesn't directly apply ...

Can you regenerate the patch using 'svn diff --show-copies-as-adds' (assuming 
you're using svn 1.7+)?

Either that or use dev-tools/scripts/diffSources.py ... thanks.

> Explore IntEncoder/Decoder bulk API
> ---
>
> Key: LUCENE-4620
> URL: https://issues.apache.org/jira/browse/LUCENE-4620
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/facet
>Reporter: Shai Erera
> Attachments: LUCENE-4620.patch
>
>
> Today, IntEncoder/Decoder offer a streaming API, where you can encode(int) 
> and decode(int). Originally, we believed that this layer can be useful for 
> other scenarios, but in practice it's used only for writing/reading the 
> category ordinals from payload/DV.
> Therefore, Mike and I would like to explore a bulk API, something like 
> encode(IntsRef, BytesRef) and decode(BytesRef, IntsRef). Perhaps the Encoder 
> can still be streaming (as we don't know in advance how many ints will be 
> written), dunno. Will figure this out as we go.
> One thing to check is whether the bulk API can work w/ e.g. facet 
> associations, which can write arbitrary byte[], and so may decoding to an 
> IntsRef won't make sense. This too we'll figure out as we go. I don't rule 
> out that associations will use a different bulk API.
> At the end of the day, the requirement is for someone to be able to configure 
> how ordinals are written (i.e. different encoding schemes: VInt, PackedInts 
> etc.) and later read, with as little overhead as possible.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549567#comment-13549567
 ] 

Uwe Schindler commented on LUCENE-3178:
---

I think this is largely related to Robert's comment:
bq. Might be interesting to revisit now that we use block compression that 
doesn't readByte(), readByte(), readByte() and hopefully avoids some of the 
bounds checks and so on that I think it helped with.

Since we moved to block codecs, the use of single-byte get's on the byte buffer 
is largely reduced. It now just reads blocks of data, so MappedByteBuffer can 
do that efficently using a memcpy(). Some MTQs are still faster because they 
read much more blocks for a large number of terms. I would have expected no 
significant speed up at all for, e.g., NRQ.

Additionally, when using the ByteBuffer methods to get bytes, I think newer 
java versions use intrinsics, that may no longer be used with your directory 
impl.

I would not provide a custom MMapDir at all, it is too risky and does not 
really brings a large speed up anymore (Java 7 + block postings).

> Native MMapDir
> --
>
> Key: LUCENE-3178
> URL: https://issues.apache.org/jira/browse/LUCENE-3178
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/store
>Reporter: Michael McCandless
>  Labels: gsoc2012, lucene-gsoc-12
> Attachments: LUCENE-3178-Native-MMap-implementation.patch, 
> LUCENE-3178-Native-MMap-implementation.patch, 
> LUCENE-3178-Native-MMap-implementation.patch
>
>
> Spinoff from LUCENE-2793.
> Just like we will create native Dir impl (UnixDirectory) to pass the right OS 
> level IO flags depending on the IOContext, we could in theory do something 
> similar with MMapDir.
> The problem is MMap is apparently quite hairy... and to pass the flags the 
> native code would need to invoke mmap (I think?), unlike UnixDir where the 
> code "only" has to open the file handle.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-4112) Dataimporting with SolrCloud Fails

2013-01-10 Thread Erick Erickson

Sausarkar:

When you say the index went from 14G to 7G, did you notice whether the
difference was tin the *.fdt and *.fdx files? That would be due to
compression of stored fields which is now the default If you could,
would you let us know the sizes of the files with those two extensions
before after? I'm trying to gather real-world examples...

But about your slowdown, does the same thing happen if you specify
&fl=score (and insure that lazy load is configured in solrconfig.xml)? I
don't think that would be reading the fields off disk and decompressing
them...

what are you measuring? Total time to return to the client? It'd also help
pin this down if you looked just at QTime in the responses, that should be
exclusive of time to assemble the documents, it's purely searching.

Thanks,
Erick

On Wed, Jan 9, 2013 at 8:50 PM, sausarkar  wrote:

> We are using solr-meter for generating query load of around 110 Queries per
> second per node.
>
> With 4.1 with the average query time is 300 msec if we switch to 4.0 the
> average query time is around 11 msec. We used the same load test params and
> same 10 million records, only differences are the version and index files,
> 4.1 has 7GB and 4.0 has 14GB.
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/jira-Created-SOLR-4112-Dataimporting-with-SolrCloud-Fails-tp4022365p4032084.html
> Sent from the Lucene - Java Developer mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Commented] (LUCENE-4670) Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier

2013-01-10 Thread Commit Tag Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549562#comment-13549562
 ] 

Commit Tag Bot commented on LUCENE-4670:


[branch_4x commit] Adrien Grand
http://svn.apache.org/viewvc?view=revision&revision=1431294

LUCENE-4670: Add finish* callbacks to StoredFieldsWriter and TermVectorsWriter.



> Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new 
> formats easier
> --
>
> Key: LUCENE-4670
> URL: https://issues.apache.org/jira/browse/LUCENE-4670
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch, 
> LUCENE-4670.patch
>
>
> This is especially useful to LUCENE-4599 where actions have to be taken after 
> a doc/field/term has been added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (LUCENE-3178) Native MMapDir

2013-01-10 Thread Erick Erickson

Haven't run across "play up" in this context (I as raised on the wrong side
of the Atlantic), but three definitions I found _all_ apply:

1> *Brit* *informal* to behave irritatingly (towards)
2> *(intr)* *Brit* *informal* (of a machine, car, etc.) to function
erratically
*3> * *Brit* *informal* to hurt; give (one) pain or trouble

Don't think I've found another two-word phrase that packs that many
varieties of how computers are mean to me in so efficiently. Gotta add that
one to my vocabulary


On Wed, Jan 9, 2013 at 2:40 PM, Greg Bowyer (JIRA)  wrote:

>
> [
> https://issues.apache.org/jira/browse/LUCENE-3178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13548885#comment-13548885]
>
> Greg Bowyer commented on LUCENE-3178:
> -
>
> Frustrating, it echos what I have been seeing so at least my benchmarking
> is not playing me up, I guess I will have to do some digging.
>
> > Native MMapDir
> > --
> >
> > Key: LUCENE-3178
> > URL: https://issues.apache.org/jira/browse/LUCENE-3178
> > Project: Lucene - Core
> >  Issue Type: Improvement
> >  Components: core/store
> >Reporter: Michael McCandless
> >  Labels: gsoc2012, lucene-gsoc-12
> > Attachments: LUCENE-3178-Native-MMap-implementation.patch,
> LUCENE-3178-Native-MMap-implementation.patch,
> LUCENE-3178-Native-MMap-implementation.patch
> >
> >
> > Spinoff from LUCENE-2793.
> > Just like we will create native Dir impl (UnixDirectory) to pass the
> right OS level IO flags depending on the IOContext, we could in theory do
> something similar with MMapDir.
> > The problem is MMap is apparently quite hairy... and to pass the flags
> the native code would need to invoke mmap (I think?), unlike UnixDir where
> the code "only" has to open the file handle.
>
> --
> This message is automatically generated by JIRA.
> If you think it was sent incorrectly, please contact your JIRA
> administrators
> For more information on JIRA, see: http://www.atlassian.com/software/jira
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

[jira] [Resolved] (LUCENE-4670) Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier

2013-01-10 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-4670.
--

Resolution: Fixed

> Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new 
> formats easier
> --
>
> Key: LUCENE-4670
> URL: https://issues.apache.org/jira/browse/LUCENE-4670
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch, 
> LUCENE-4670.patch
>
>
> This is especially useful to LUCENE-4599 where actions have to be taken after 
> a doc/field/term has been added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549554#comment-13549554
 ] 

Robert Muir commented on LUCENE-4674:
-

I will open a new issue to remove all write methods from bytesref.

this is a ref class, not a stringbuilder. we have to keep these apis contained.

> Consistently set offset=0 in BytesRef.copyBytes
> ---
>
> Key: LUCENE-4674
> URL: https://issues.apache.org/jira/browse/LUCENE-4674
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-4674.patch
>
>
> BytesRef.copyBytes(BytesRef other) has two branches:
>  - either the destination array is large enough and it will copy bytes after 
> offset,
>  - or it needs to resize and in that case it will set offset = 0.
> I think this method should always set offset = 0 for consistency, and to 
> avoid resizing when other.length is larger than this.bytes.length - 
> this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4670) Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new formats easier

2013-01-10 Thread Commit Tag Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549549#comment-13549549
 ] 

Commit Tag Bot commented on LUCENE-4670:


[trunk commit] Adrien Grand
http://svn.apache.org/viewvc?view=revision&revision=1431283

LUCENE-4670: Add finish* callbacks to StoredFieldsWriter and TermVectorsWriter.



> Add TermVectorsWriter.finish{Doc,Field,Term} to make development of new 
> formats easier
> --
>
> Key: LUCENE-4670
> URL: https://issues.apache.org/jira/browse/LUCENE-4670
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Fix For: 4.1
>
> Attachments: LUCENE-4670.patch, LUCENE-4670.patch, LUCENE-4670.patch, 
> LUCENE-4670.patch
>
>
> This is especially useful to LUCENE-4599 where actions have to be taken after 
> a doc/field/term has been added.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549546#comment-13549546
 ] 

Shai Erera commented on LUCENE-4674:


how about allocating a new byte[] if someOtherStuff offset + length > 
this.offset + length?

> Consistently set offset=0 in BytesRef.copyBytes
> ---
>
> Key: LUCENE-4674
> URL: https://issues.apache.org/jira/browse/LUCENE-4674
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-4674.patch
>
>
> BytesRef.copyBytes(BytesRef other) has two branches:
>  - either the destination array is large enough and it will copy bytes after 
> offset,
>  - or it needs to resize and in that case it will set offset = 0.
> I think this method should always set offset = 0 for consistency, and to 
> avoid resizing when other.length is larger than this.bytes.length - 
> this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4674) Consistently set offset=0 in BytesRef.copyBytes

2013-01-10 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13549544#comment-13549544
 ] 

Adrien Grand commented on LUCENE-4674:
--

bq. b.copy(someOtherStuff...) should NOT muck with a.

Unfortunately a.copy(otherStuff) will modify b if otherStuff.length > 5.

> Consistently set offset=0 in BytesRef.copyBytes
> ---
>
> Key: LUCENE-4674
> URL: https://issues.apache.org/jira/browse/LUCENE-4674
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Minor
> Attachments: LUCENE-4674.patch
>
>
> BytesRef.copyBytes(BytesRef other) has two branches:
>  - either the destination array is large enough and it will copy bytes after 
> offset,
>  - or it needs to resize and in that case it will set offset = 0.
> I think this method should always set offset = 0 for consistency, and to 
> avoid resizing when other.length is larger than this.bytes.length - 
> this.offset but smaller than this.bytes.length.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

1 2 >

1 - 100 of 111 matches

Mail list logo