date:20140127


 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\\ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
code 400

To make it work i have to put in fornt of the query nother '\'
\\ujb
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {quote}\\ujb{quote}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.

Dorin Oltean created SOLR-5669:
--

 Summary: queries containing \u  return error: Truncated unicode 
escape sequence.
 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor


When I do the following query:
/select?q=\ujb

I get 
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
code 400

To make it work i have to put in fornt of the query nother '\'
\\ujb
wich in fact leads to a different query in solr.

I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.


 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{noformat}\\ujb{noformat}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}{noformat}\\ujb{noformat}{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {noformat}\\ujb{noformat}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.


 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}{noformat}\\ujb{noformat}{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\ \ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {quote}{noformat}\\ujb{noformat}{quote}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5669) queries containing \u return error: Truncated unicode escape sequence.


 [ 
https://issues.apache.org/jira/browse/SOLR-5669?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dorin Oltean updated SOLR-5669:
---

Description: 
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\ \ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.

  was:
When I do the following query:
/select?q=\ujb

I get 
{quote}
org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
sequence: j,
{quote}

To make it work i have to put in fornt of the query nother '\'
{quote}\\ujb{quote}
wich in fact leads to a different query in solr.

I use edismax qparser.


 queries containing \u  return error: Truncated unicode escape sequence.
 -

 Key: SOLR-5669
 URL: https://issues.apache.org/jira/browse/SOLR-5669
 Project: Solr
  Issue Type: Bug
  Components: query parsers
Affects Versions: 4.4
Reporter: Dorin Oltean
Priority: Minor

 When I do the following query:
 /select?q=\ujb
 I get 
 {quote}
 org.apache.solr.search.SyntaxError: Non-hex character in Unicode escape 
 sequence: j,
 {quote}
 To make it work i have to put in fornt of the query nother '\'
 {quote}\ \ujb{quote}
 wich in fact leads to a different query in solr.
 I use edismax qparser.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5670) _version_ either indexed OR docvalue

Per Steffensen created SOLR-5670:


 Summary: _version_ either indexed OR docvalue
 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen


As far as I can see there is no good reason to require that _version_ field 
has to be indexed if it is docvalued. So I guess it will be ok with a rule 
saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5670) _version_ either indexed OR docvalue


 [ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-5670:
-

Attachment: SOLR-5670.patch

Simple patch attached. No testes of it added, but I have seen it working 
locally.

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5670) _version_ either indexed OR docvalue


[ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882811#comment-13882811
 ] 

Per Steffensen edited comment on SOLR-5670 at 1/27/14 3:38 PM:
---

Simple patch attached. No testes of it added, but I have seen it working 
locally. 4.4.0 test-suite is green with this change. Do not know if branch_4x 
test-suite is.


was (Author: steff1193):
Simple patch attached. No testes of it added, but I have seen it working 
locally.

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5670) _version_ either indexed OR docvalue

[
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Shawn Heisey updated SOLR-5670:
---

Attachment: SOLR-5670.patch

From a design perspective, I can't claim to know whether this is an acceptable
patch or not. Consistency in configs across multiple users and multiple
versions does have some value, which is a very minor argument against this
change.

Is there any benchmark data? If docValues provides better performance for
_version_ than indexed when it is used for its intended purpose, it might be
worth changing the example config ... but people should know that if they *do*
change the config on this field, they will have to completely reindex.

This patch is functionally identical to the previous one, it just modifies an
error message. I didn't check to see what branch Per's patch was created on,
but it did apply cleanly to branch_4x. This patch is against that branch.

_version_ either indexed OR docvalue

Key: SOLR-5670
URL: https://issues.apache.org/jira/browse/SOLR-5670
Project: Solr
Issue Type: Improvement
Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
Labels: solr, solrcloud, version
Attachments: SOLR-5670.patch, SOLR-5670.patch

As far as I can see there is no good reason to require that _version_ field
has to be indexed if it is docvalued. So I guess it will be ok with a rule
saying _version_ has to be either indexed or docvalue (allowed to be both).

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882899#comment-13882899
]

Shawn Heisey edited comment on SOLR-5670 at 1/27/14 3:41 PM:
-

Is there any benchmark data? If docValues provides better performance for
\_version\_ than indexed when it is used for its intended purpose, it might be
worth changing the example config ... but people should know that if they *do*
change the config on this field, they will have to completely reindex.

was (Author: elyograg):
From a design perspective, I can't claim to know whether this is an acceptable
patch or not. Consistency in configs across multiple users and multiple
versions does have some value, which is a very minor argument against this
change.

_version_ either indexed OR docvalue

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Updated] (SOLR-4787) Join Contrib

2014-01-27 Thread Kranti Parisa

does this also applicable for the hjoin?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa

On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) j...@apache.orgwrote:

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

Joel Bernstein updated SOLR-4787:
-

Attachment: SOLR-4787.patch

Resolved a memory leak when the bjoin is used with cache autowarming.

Join Contrib

This contrib provides a place where different join implementations can
be contributed to Solr. This contrib currently includes 3 join
implementations. The initial patch was generated from the Solr 4.3 tag.
Because of changes in the FieldCache API this patch will only build with
Solr 4.2 or above.
*HashSetJoinQParserPlugin aka hjoin*
The hjoin provides a join implementation that filters results in one
core based on the results of a search in another core. This is similar in
functionality to the JoinQParserPlugin but the implementation differs in a
couple of important ways.
The first way is that the hjoin is designed to work with int and long
join keys only. So, in order to use hjoin, int or long join keys must be
included in both the to and from core.
The second difference is that the hjoin builds memory structures that
are used to quickly connect the join keys. So, the hjoin will need more
memory then the JoinQParserPlugin to perform the join.
The main advantage of the hjoin is that it can scale to join millions of
keys between cores and provide sub-second response time. The hjoin should
work well with up to two million results from the fromIndex and tens of
millions of results from the main query.
The hjoin supports the following features:
1) Both lucene query and PostFilter implementations. A *cost* 99
will turn on the PostFilter. The PostFilter will typically outperform the
Lucene query when the main query results have been narrowed down.
2) With the lucene query implementation there is an option to build the
filter with threads. This can greatly improve the performance of the query
if the main query index is very large. The threads parameter turns on
threading. For example *threads=6* will use 6 threads to build the filter.
This will setup a fixed threadpool with six threads to handle all hjoin
requests. Once the threadpool is created the hjoin will always use it to
build the filter. Threading does not come into play with the PostFilter.
3) The *size* local parameter can be used to set the initial size of the
hashset used to perform the join. If this is set above the number of
results from the fromIndex then the you can avoid hashset resizing which
improves performance.
4) Nested filter queries. The local parameter fq can be used to nest a
filter query within the join. The nested fq will filter the results of the
join query. This can point to another join to support nested joins.
5) Full caching support for the lucene query implementation. The
filterCache and queryResultCache should work properly even with deep
nesting of joins. Only the queryResultCache comes into play with the
PostFilter implementation because PostFilters are not cacheable in the
filterCache.
The syntax of the hjoin is similar to the JoinQParserPlugin except that
the plugin is referenced by the string hjoin rather then join.
fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
fq=$qq\}user:customer1qq=group:5
The example filter query above will search the fromIndex (collection2)
for user:customer1 applying the local fq parameter to filter the results.
The lucene filter query will be built using 6 threads. This query will
generate a list of values from the from field that will be used to filter
the main query. Only records from the main query, where the to field is
present in the from list will be included in the results.
The solrconfig.xml in the main query core must contain the reference to
the hjoin.
queryParser name=hjoin
class=org.apache.solr.joins.HashSetJoinQParserPlugin/
And the join contrib lib jars must be registed in the solrconfig.xml.
lib dir=../../../contrib/joins/lib regex=.*\.jar /
After issuing the ant dist command from inside

[jira] [Commented] (SOLR-5658) commitWithin does not reflect the new documents added

2014-01-27 Thread Erik Hatcher (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882849#comment-13882849
 ] 

Erik Hatcher commented on SOLR-5658:


[~markmil...@gmail.com] Is this ticket complete as of Solr 4.6.1?  Just 
wondering if it can be closed.  Thanks!

 commitWithin does not reflect the new documents added
 -

 Key: SOLR-5658
 URL: https://issues.apache.org/jira/browse/SOLR-5658
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0
Reporter: Varun Thacker
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5658.patch, SOLR-5658.patch


 I start 4 nodes using the setup mentioned on - 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  
 I added a document using - 
 curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: 
 text/xml --data-binary 'adddocfield 
 name=idtestdoc/field/doc/add'
 In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit 
 with openSearcher=false
 In Solr 4.6.x there is there is only one commit hard commit with 
 openSearcher=false
  
 So even after 10 seconds queries on none of the shards reflect the added 
 document. 
 This was also reported on the solr-user list ( 
 http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html
  )
 Here are the relevant logs 
 Logs from Solr 4.5.1
 Node 1:
 {code}
 420021 [qtp619011445-12] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45
 {code}
  
 Node 2:
 {code}
 119896 [qtp1608701025-10] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[testdoc (1458003295513608192)]} 0 348
 129648 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 129679 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@e174f70 main
 129680 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener done.
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 [collection1] Registered new searcher Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 134648 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 SolrDeletionPolicy.onCommit: commits: num=2
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3}
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 newest commit generation = 4
 134660 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
  {code}
  
 Node 3:
  
 Node 4:
 {code}
 374545 [qtp1608701025-16] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
  {add=[testdoc (1458002133233172480)]} 0 20
 384545 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 384552 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@36137e08 main
 384553 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 384553

Re: [jira] [Updated] (SOLR-4787) Join Contrib

2014-01-27 Thread Joel Bernstein

Kranti,

The memory leak in the bjoin dealt with the multi-value field joins.
Specifically how the new UninvertedIntField cache was used in the bjoin. In
a quick review of the hjoin I'm not seeing the same issue but it would be
good to confirm through testing.

Joel

Joel Bernstein
Search Engineer at Heliosearch

On Mon, Jan 27, 2014 at 10:06 AM, Kranti Parisa kranti.par...@gmail.comwrote:

does this also applicable for the hjoin?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa

On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA) j...@apache.orgwrote:

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

Joel Bernstein updated SOLR-4787:
-

Attachment: SOLR-4787.patch

Resolved a memory leak when the bjoin is used with cache autowarming.

Join Contrib

This contrib provides a place where different join implementations can
be contributed to Solr. This contrib currently includes 3 join
implementations. The initial patch was generated from the Solr 4.3 tag.
Because of changes in the FieldCache API this patch will only build with
Solr 4.2 or above.
*HashSetJoinQParserPlugin aka hjoin*
The hjoin provides a join implementation that filters results in one
core based on the results of a search in another core. This is similar in
functionality to the JoinQParserPlugin but the implementation differs in a
couple of important ways.
The first way is that the hjoin is designed to work with int and long
join keys only. So, in order to use hjoin, int or long join keys must be
included in both the to and from core.
The second difference is that the hjoin builds memory structures that
are used to quickly connect the join keys. So, the hjoin will need more
memory then the JoinQParserPlugin to perform the join.
The main advantage of the hjoin is that it can scale to join millions
of keys between cores and provide sub-second response time. The hjoin
should work well with up to two million results from the fromIndex and tens
of millions of results from the main query.
The hjoin supports the following features:
1) Both lucene query and PostFilter implementations. A *cost* 99
will turn on the PostFilter. The PostFilter will typically outperform the
Lucene query when the main query results have been narrowed down.
2) With the lucene query implementation there is an option to build the
filter with threads. This can greatly improve the performance of the query
if the main query index is very large. The threads parameter turns on
threading. For example *threads=6* will use 6 threads to build the filter.
This will setup a fixed threadpool with six threads to handle all hjoin
requests. Once the threadpool is created the hjoin will always use it to
build the filter. Threading does not come into play with the PostFilter.
3) The *size* local parameter can be used to set the initial size of
the hashset used to perform the join. If this is set above the number of
results from the fromIndex then the you can avoid hashset resizing which
improves performance.
4) Nested filter queries. The local parameter fq can be used to nest
a filter query within the join. The nested fq will filter the results of
the join query. This can point to another join to support nested joins.
5) Full caching support for the lucene query implementation. The
filterCache and queryResultCache should work properly even with deep
nesting of joins. Only the queryResultCache comes into play with the
PostFilter implementation because PostFilters are not cacheable in the
filterCache.
The syntax of the hjoin is similar to the JoinQParserPlugin except that
the plugin is referenced by the string hjoin rather then join.
fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
fq=$qq\}user:customer1qq=group:5
The example filter query above will search the fromIndex (collection2)
for user:customer1 applying the local fq parameter to filter the results.
The lucene filter query will be built using 6 threads. This query will
generate a list of values from the from field that will be used to filter
the main query. Only records from the main query, where the to field is

[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected


[ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882884#comment-13882884
 ] 

ASF subversion and git services commented on SOLR-5671:
---

Commit 1561711 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1561711 ]

SOLR-5671: increase logging to try and track down test failure (merged trunk 
r1561709)

 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Kranti Parisa

Congratulations Benson!

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa



On Mon, Jan 27, 2014 at 9:32 AM, Alan Woodward a...@flax.co.uk wrote:

 Congratulations and welcome, Benson!

  Alan Woodward
 www.flax.co.uk


 On 26 Jan 2014, at 17:43, Shawn Heisey wrote:

 On 1/25/2014 2:40 PM, Michael McCandless wrote:

 I'm pleased to announce that Benson Margulies has accepted to join our

 ranks as a committer.


 Benson has been involved in a number of Lucene/Solr issues over time

 (see
 http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies

 ), most recently on debugging tricky analysis issues.


 Congratulations and welcome!  One more to try and keep me in line.


 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5414) suggest module should not depend on expression module

2014-01-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882879#comment-13882879
 ] 

ASF subversion and git services commented on LUCENE-5414:
-

Commit 1561708 from [~steve_rowe] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1561708 ]

LUCENE-5414: intellij config (merged trunk r1561707)

 suggest module should not depend on expression module
 -

 Key: LUCENE-5414
 URL: https://issues.apache.org/jira/browse/LUCENE-5414
 Project: Lucene - Core
  Issue Type: Wish
Affects Versions: 4.6, 5.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch, 
 LUCENE-5414.patch


 Currently our suggest module depends on the expression module just because 
 the DocumentExpressionDictionary provides some util ctor to pass in an 
 expression directly. That is a lot of dependency for little value IMO and 
 pulls in lots of JARs. DocumentExpressionDictionary should only take a 
 ValueSource instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Alan Woodward

Congratulations and welcome, Benson!

Alan Woodward
www.flax.co.uk


On 26 Jan 2014, at 17:43, Shawn Heisey wrote:

 On 1/25/2014 2:40 PM, Michael McCandless wrote:
 I'm pleased to announce that Benson Margulies has accepted to join our
 ranks as a committer.
 
 Benson has been involved in a number of Lucene/Solr issues over time
 (see 
 http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies
 ), most recently on debugging tricky analysis issues.
 
 Congratulations and welcome!  One more to try and keep me in line.
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected

2014-01-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882881#comment-13882881
 ] 

ASF subversion and git services commented on SOLR-5671:
---

Commit 1561709 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1561709 ]

SOLR-5671: increase logging to try and track down test failure

 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Jetty version should go in CHANGES.TXT

2014-01-27 Thread Koji Sekiguchi


+1

koji
--
http://soleami.com/blog/mahout-and-machine-learning-training-course-is-here.html

(14/01/27 21:44), Jan Høydahl wrote:

Hi,

I'd argue that Jetty can be said to be a major component of Solr, so I suggest we add 
Jetty version under the section Versions of Major Components in Solr's 
CHANGES.TXT ?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org







-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5652:
-

Description: 
Several times now, Uwe's jenkins has encountered a walk already seen ... 
assertion failure from DistribCursorPagingTest that I've been unable to fathom, 
let alone reproduce (although sarowe was able to trigger a similar, 
non-reproducible seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far:
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far seen on MacOSX and Linux
* So far seen on branch 4x and trunk
* So far seen on Java6, Java7, and Java8
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were sorting on one of the \*_dv_last or \*_dv_first fields 
(docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
** for desc sorts, sort on same field asc has worked fine just before this 
(fields are in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked

(specifics of each failure seen in the wild recorded in comments)

  was:
Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
failure from DistribCursorPagingTest that I've been unable to fathom, let alone 
reproduce (although sarowe was able to trigger a similar, non-reproducible 
seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far:
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far seen on MacOSX and Linux
* So far seen on branch 4x and trunk
* So far seen on Java6, Java7, and Java8
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were sorting on one of the \*_dv_last or \*_dv_first fields 
(docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
** for desc sorts, sort on same field asc has worked fine just before this 
(fields are in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked

(specifics of each failure seen in the wild recorded in comments)


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Updated] (SOLR-4787) Join Contrib

2014-01-27 Thread Kranti Parisa

Thanks Joel. I shall look into that.

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa

On Mon, Jan 27, 2014 at 10:19 AM, Joel Bernstein joels...@gmail.com wrote:

Kranti,

Joel

Joel Bernstein
Search Engineer at Heliosearch

On Mon, Jan 27, 2014 at 10:06 AM, Kranti Parisa
kranti.par...@gmail.comwrote:

does this also applicable for the hjoin?

Thanks,
Kranti K. Parisa
http://www.linkedin.com/in/krantiparisa

On Mon, Jan 27, 2014 at 7:27 AM, Joel Bernstein (JIRA)
j...@apache.orgwrote:

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]

Joel Bernstein updated SOLR-4787:
-

Attachment: SOLR-4787.patch

Resolved a memory leak when the bjoin is used with cache autowarming.

Join Contrib

This contrib provides a place where different join implementations can
be contributed to Solr. This contrib currently includes 3 join
implementations. The initial patch was generated from the Solr 4.3 tag.
Because of changes in the FieldCache API this patch will only build with
Solr 4.2 or above.
*HashSetJoinQParserPlugin aka hjoin*
The hjoin provides a join implementation that filters results in one
core based on the results of a search in another core. This is similar in
functionality to the JoinQParserPlugin but the implementation differs in a
couple of important ways.
The first way is that the hjoin is designed to work with int and long
join keys only. So, in order to use hjoin, int or long join keys must be
included in both the to and from core.
The second difference is that the hjoin builds memory structures that
are used to quickly connect the join keys. So, the hjoin will need more
memory then the JoinQParserPlugin to perform the join.
The main advantage of the hjoin is that it can scale to join millions
of keys between cores and provide sub-second response time. The hjoin
should work well with up to two million results from the fromIndex and tens
of millions of results from the main query.
The hjoin supports the following features:
1) Both lucene query and PostFilter implementations. A *cost* 99
will turn on the PostFilter. The PostFilter will typically outperform the
Lucene query when the main query results have been narrowed down.
2) With the lucene query implementation there is an option to build
the filter with threads. This can greatly improve the performance of the
query if the main query index is very large. The threads parameter turns
on threading. For example *threads=6* will use 6 threads to build the
filter. This will setup a fixed threadpool with six threads to handle all
hjoin requests. Once the threadpool is created the hjoin will always use it
to build the filter. Threading does not come into play with the PostFilter.
3) The *size* local parameter can be used to set the initial size of
the hashset used to perform the join. If this is set above the number of
results from the fromIndex then the you can avoid hashset resizing which
improves performance.
4) Nested filter queries. The local parameter fq can be used to nest
a filter query within the join. The nested fq will filter the results of
the join query. This can point to another join to support nested joins.
5) Full caching support for the lucene query implementation. The
filterCache and queryResultCache should work properly even with deep
nesting of joins. Only the queryResultCache comes into play with the
PostFilter implementation because PostFilters are not cacheable in the
filterCache.
The syntax of the hjoin is similar to the JoinQParserPlugin except
that the plugin is referenced by the string hjoin rather then join.
fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
fq=$qq\}user:customer1qq=group:5
The example filter query above will search the fromIndex (collection2)
for user:customer1 applying the local fq parameter to filter the results.
The lucene filter query

[jira] [Commented] (LUCENE-5416) Performance of a FixedBitSet variant that uses Long.numberOfTrailingZeros()

2014-01-27 Thread Paul Elschot (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882919#comment-13882919
 ] 

Paul Elschot commented on LUCENE-5416:
--

The last benchmark output is here: 
https://github.com/PaulElschot/lucene-solr/commit/772b55ad3c3d94752b37aa81b2e96cb50b321cf6
 ,
see from line 313 in this output, the comparisons and loads are given in 10log 
numbers.

In short:
- for advance() this is a factor of 1.7 to 4 times faster, and
- for nextDoc() this is up to 2.5 times faster, but for load factors higher 
than about 0.25 it is up to about 5 times slower.

 Performance of a FixedBitSet variant that uses Long.numberOfTrailingZeros()
 ---

 Key: LUCENE-5416
 URL: https://issues.apache.org/jira/browse/LUCENE-5416
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/search
Affects Versions: 5.0
Reporter: Paul Elschot
Priority: Minor
 Fix For: 5.0


 On my machine the current byte index used in OpenBitSetIterator is slower 
 than Long.numberOfTrailingZeros() for advance().
 The pull request contains the code for benchmarking this taken from an early 
 stage of DocBlocksIterator.
 In case the benchmark shows improvements on more machines, well, we know what 
 to do...



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5414) suggest module should not depend on expression module


[ 
https://issues.apache.org/jira/browse/LUCENE-5414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882873#comment-13882873
 ] 

ASF subversion and git services commented on LUCENE-5414:
-

Commit 1561707 from [~steve_rowe] in branch 'dev/trunk'
[ https://svn.apache.org/r1561707 ]

LUCENE-5414: intellij config

 suggest module should not depend on expression module
 -

 Key: LUCENE-5414
 URL: https://issues.apache.org/jira/browse/LUCENE-5414
 Project: Lucene - Core
  Issue Type: Wish
Affects Versions: 4.6, 5.0
Reporter: Simon Willnauer
Assignee: Simon Willnauer
 Fix For: 5.0, 4.7

 Attachments: LUCENE-5414.patch, LUCENE-5414.patch, LUCENE-5414.patch, 
 LUCENE-5414.patch


 Currently our suggest module depends on the expression module just because 
 the DocumentExpressionDictionary provides some util ctor to pass in an 
 expression directly. That is a lot of dependency for little value IMO and 
 pulls in lots of JARs. DocumentExpressionDictionary should only take a 
 ValueSource instead.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected

Steve Rowe created SOLR-5671:


 Summary: Heisenbug #2 in DistribCursorPagingTest: full walk 
returns one fewer doc than expected 
 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe


Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
number of indexed docs and retrieved one fewer doc than the number of indexed 
docs.  Both of these failures were on trunk on Windows:

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  

I've also seen this twice on trunk on my OS X laptop (out of 875 trials).

None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected


[ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882885#comment-13882885
 ] 

Steve Rowe commented on SOLR-5671:
--

I committed a change to DistribCursorPagingTest that will print the details of 
the indexed doc(s) not returned by deep paging.

 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2366) Facet Range Gaps

2014-01-27 Thread Ted Sullivan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882890#comment-13882890
 ] 

Ted Sullivan commented on SOLR-2366:


Right. I'm following with [~shalinmangar] suggestion to split out your/Hoss's 
facet.range.spec / facet.sequence idea as a separate issue. I don't think of 
this as extending the gap parameter - I am just providing more explicit 
information in the response as to what gaps you actually get (as per your 
suggestion of Sept/2011) - similar to what you would get if you implemented 
this using facet.query. Looking at the current code, it is pretty easy to add 
the range information to the response (right now the response labels are just 
the gap starts). This may be user-unfriendly as you say, but I would argue that 
it is more friendly than what we have right now - it is certainly more 
developer-friendly because it provides better feedback. There is a lot of 
interest in this feature (it has been advertised on the SimpleFacetsParameter 
Wiki for some time now) as evidenced by earlier comments in this thread. My 
original desire was just to make (the patch) usable for those that want to use 
it by upgrading Grant's original patch so that it would work with the new(?) 
modular class organization. The work required to spiff up the facet.range.gap 
response is not large. I haven't impacted the facet.range.spec/buckets approach 
but that would seem to require more effort.

 Facet Range Gaps
 

 Key: SOLR-2366
 URL: https://issues.apache.org/jira/browse/SOLR-2366
 Project: Solr
  Issue Type: Improvement
Reporter: Grant Ingersoll
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-2366.patch, SOLR-2366.patch, SOLR-2366.patch


 There really is no reason why the range gap for date and numeric faceting 
 needs to be evenly spaced.  For instance, if and when SOLR-1581 is completed 
 and one were doing spatial distance calculations, one could facet by function 
 into 3 different sized buckets: walking distance (0-5KM), driving distance 
 (5KM-150KM) and everything else (150KM+), for instance.  We should be able to 
 quantize the results into arbitrarily sized buckets.
 (Original syntax proposal removed, see discussion for concrete syntax)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5670) _version_ either indexed OR docvalue

[
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882933#comment-13882933
]

Per Steffensen commented on SOLR-5670:
--

bq. Is there any benchmark data? If docValues provides better performance for
_version_ than indexed

I do not think it will in most cases.
* Indexed: When you want to get the _version_ for a particular doc-no (found by
id), you can make a lookup in FieldCache holding the reversed term-index - this
is in memory and constant time. If you have a very rapidly changing data-set
(so that FieldCache-entries will be invalidated often due to merging) you might
get better performance (response-time) with doc-values - but not in general, I
think.
* DocValues: You will read the _version_ from doc-values which in not
necessarily in memory

We are prepared to take a small performance hit, to avoid having all that data
in FieldCache. In general we do not allow putting anything in FieldCache,
because we have so many documents, that is always creates issues with too much
memory usage. The problem with FieldCache is that it is all or nothing - for a
good reasons! - we just cannot live with it.

We havnt made the change on _version_ (going from indexed to doc-value) in
production yet. We will do some performance testing on it first, and depending
on how much we decide to do, I can get back with some numbers.

bq. when it is used for its intended purpose, it might be worth changing the
example config

Do not think you should do that. Using FieldCache is probably the best
default. But writing something somewhere about the option of using doc-values
instead of indexed, and when that is a good idea, would be nice.

bq. ... but people should know that if they do change the config on this field,
they will have to completely reindex.

Or just start using it from now on in new collections. We create a new
collection every month and keep a history of data by keeping the latest 24
collections. One of many reasons for doing this, is that it provides us the
option of changing indexing-strategy etc every month. For us re-indexing is
completely out of the question - we have billions and billions of records in
Solr and re-indexing them all in a fairly short service-window is not possible.
Therefore we built this new-collection-every-month thingy in order to have some
flexibility.

bq. This patch is functionally identical to the previous one, it just modifies
an error message.

Nicely spotted

bq. I didn't check to see what branch Per's patch was created on, but it did
apply cleanly to branch_4x.

It was branch_4x

_version_ either indexed OR docvalue

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Welcome Benson Margulies as Lucene/Solr committer!

2014-01-27 Thread Ryan Ernst

Welcome!
On Jan 25, 2014 1:41 PM, Michael McCandless
luc...@mikemccandless.comjavascript:_e({}, 'cvml',
'luc...@mikemccandless.com');
wrote:

 I'm pleased to announce that Benson Margulies has accepted to join our
 ranks as a committer.

 Benson has been involved in a number of Lucene/Solr issues over time
 (see
 http://jirasearch.mikemccandless.com/search.py?index=jirachg=ddsa1=allUsersa2=Benson+Margulies
 ), most recently on debugging tricky analysis issues.

 Benson, it is tradition that you introduce yourself with a brief bio.
 I know you're heavily involved in other Apache projects already...

 Once your account is set up, you should then be able to add yourself
 to the who we are page on the website as well.

 Congratulations and welcome!

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.orgjavascript:_e({}, 
 'cvml', 'dev-unsubscr...@lucene.apache.org');
 For additional commands, e-mail: dev-h...@lucene.apache.orgjavascript:_e({}, 
 'cvml', 'dev-h...@lucene.apache.org');

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882832#comment-13882832
]

Per Steffensen commented on SOLR-4470:
--

bq. We are currently using SOLR 4.5.1 in our production environment and we
tried to setup security on a SOLR cloud configuration.

Container managed authentication and authorization I presume?

bq. I have read all the 4470 issue activity and it will be very useful for us
to be able to download the SOLR-4470_branch_4x_r1452629.patch already compiled
from some place, until the 4.7 version is released.

Guess you are looking at Fix Version/s: 4.7 on this issue, and expect that
this means that the fix will be in 4.7. I do not believe it will -
unfortunately. So if you want the feature, you need to change the patch
yourself to fit the version of Solr you are using, or you can download code for
Solr 4.4 plus numerous improvements (including SOLR-4470) here:
https://github.com/steff1193/lucene-solr. You will have to build a Solr
distribution yourself - and maven artifacts if you need those
* Building distribution from source
{code}
checkout
cd solr
ant -Dversion=4.4.0.myversion clean create-package
{code}
* Building and deploying artifacts is a little more complicated. Let me know if
you need that.

*Please note* that https://github.com/steff1193/lucene-solr is only a place
where we keep our version of Lucene/Solr, including the changes we have made
which has not yet been committed in Apache Solr regi. You are free to use it,
but there is no guarantee that there will ever be a version based on a Apache
Solr version higher than 4.4. It is very likely that there will be, but no
guarantee and you never know when it will happen. Of course it is all open
source so if you really want you can fork it yourself.

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470.patch, SOLR-4470.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r145.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5658) commitWithin does not reflect the new documents added


 [ 
https://issues.apache.org/jira/browse/SOLR-5658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5658.
---

Resolution: Fixed

 commitWithin does not reflect the new documents added
 -

 Key: SOLR-5658
 URL: https://issues.apache.org/jira/browse/SOLR-5658
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.6, 5.0
Reporter: Varun Thacker
Assignee: Mark Miller
Priority: Critical
 Fix For: 5.0, 4.7, 4.6.1

 Attachments: SOLR-5658.patch, SOLR-5658.patch


 I start 4 nodes using the setup mentioned on - 
 https://cwiki.apache.org/confluence/display/solr/Getting+Started+with+SolrCloud
  
 I added a document using - 
 curl http://localhost:8983/solr/update?commitWithin=1 -H Content-Type: 
 text/xml --data-binary 'adddocfield 
 name=idtestdoc/field/doc/add'
 In Solr 4.5.1 there is 1 soft commit with openSearcher=true and 1 hard commit 
 with openSearcher=false
 In Solr 4.6.x there is there is only one commit hard commit with 
 openSearcher=false
  
 So even after 10 seconds queries on none of the shards reflect the added 
 document. 
 This was also reported on the solr-user list ( 
 http://lucene.472066.n3.nabble.com/Possible-regression-for-Solr-4-6-0-commitWithin-does-not-work-with-replicas-td4106102.html
  )
 Here are the relevant logs 
 Logs from Solr 4.5.1
 Node 1:
 {code}
 420021 [qtp619011445-12] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update params={commitWithin=1} {add=[testdoc]} 0 45
 {code}
  
 Node 2:
 {code}
 119896 [qtp1608701025-10] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:8983/solr/collection1/update.distrib=TOLEADERwt=javabinversion=2}
  {add=[testdoc (1458003295513608192)]} 0 348
 129648 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 129679 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@e174f70 main
 129680 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener done.
 129681 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 [collection1] Registered new searcher Searcher@e174f70 
 main{StandardDirectoryReader(segments_3:11:nrt _2(4.5.1):C1)}
 134648 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 SolrDeletionPolicy.onCommit: commits: num=2
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_3,generation=3}
   
 commit{dir=NRTCachingDirectory(org.apache.lucene.store.NIOFSDirectory@/Users/varun/solr-4.5.1/node2/solr/collection1/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@66a394a3; 
 maxCacheMB=48.0 maxMergeSizeMB=4.0),segFN=segments_4,generation=4}
 134658 [commitScheduler-7-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 newest commit generation = 4
 134660 [commitScheduler-7-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
  {code}
  
 Node 3:
  
 Node 4:
 {code}
 374545 [qtp1608701025-16] INFO  
 org.apache.solr.update.processor.LogUpdateProcessor  – [collection1] 
 webapp=/solr path=/update 
 params={distrib.from=http://192.168.1.103:7574/solr/collection1/update.distrib=FROMLEADERwt=javabinversion=2}
  {add=[testdoc (1458002133233172480)]} 0 20
 384545 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – start 
 commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
 384552 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.search.SolrIndexSearcher  – Opening Searcher@36137e08 main
 384553 [commitScheduler-8-thread-1] INFO  
 org.apache.solr.update.UpdateHandler  – end_commit_flush
 384553 [searcherExecutor-5-thread-1] INFO  org.apache.solr.core.SolrCore  – 
 QuerySenderListener sending requests to Searcher@36137e08

[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...


 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5652:
-

Description: 
Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
failure from DistribCursorPagingTest that I've been unable to fathom, let alone 
reproduce (although sarowe was able to trigger a similar, non-reproducible 
seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far:
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far seen on MacOSX and Linux
* So far seen on branch 4x and trunk
* So far seen on Java6, Java7, and Java8
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were sorting on one of the \*_dv_last or \*_dv_first fields 
(docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
** for desc sorts, sort on same field asc has worked fine just before this 
(fields are in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked

(specifics of each failure seen in the wild recorded in comments)

  was:
Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
failure from DistribCursorPagingTest that I've been unable to fathom, let alone 
reproduce (although sarowe was able to trigger a similar, non-reproducible 
seed, failure on his machine)

Using this as a tracking issue to try and make sense of it.

Summary of things noticed so far (in 3 failures):
* So far only seen on http://jenkins.thetaphi.de  sarowe's mac
* So far only seen on MacOSX
* So far only seen on branch 4x
* So far seen on both Java6 and Java7
* fails occured in first block of randomized testing: 
** we've indexed a small number of randomized docs
** we're explicitly looping over every field and sorting in both directions
* fails were both when doing a desc sorting on one of the \*_dv_last or 
\*_dv_first fields (docValues=true, either sortMissingLast=true OR 
sortMissingFirst=true) 
** sort on same field asc has always worked fine just before this (fields are 
in arbitrary order, but asc always tried before desc)
** sorting on some other random fields has sometimes been tried before this and 
worked


(specifics of each failure seen in the wild recorded in comments)


Updated summary

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Twice now, Uwe's jenkins has encountered a walk already seen ... assertion 
 failure from DistribCursorPagingTest that I've been unable to fathom, let 
 alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-797) Construct EmbeddedSolrServer response without serializing/parsing

2014-01-27 Thread Gregg Donovan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882944#comment-13882944
 ] 

Gregg Donovan commented on SOLR-797:


I'm interested in this as well. We had an custom API that was similar to the 
attached patch. When we switched to EmbeddedSolrServer we noticed an increase 
in time spent deserializing the Solr response, memory allocated, and GC 
spikiness.

One issue with the current EmbeddedSolrServer code is that it starts with 
ByteArrayOutputStream of 32 bytes and resizes repeatedly it to fit the results. 
We have large responses and we notice the GC hit. We experimented with a 
ThreadLocalByteBuffer, but avoiding serializing and parsing altogether for 
EmbeddedSolrServer seems like an even better idea.

If there's interest, we'd be happy to revive/update/test this patch.

 Construct EmbeddedSolrServer response without serializing/parsing
 -

 Key: SOLR-797
 URL: https://issues.apache.org/jira/browse/SOLR-797
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 1.3
Reporter: Jonathan Lee
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-797.patch, SOLR-797.patch


 Currently, the EmbeddedSolrServer serializes the response and reparses in 
 order to create the final NamedList response.  From the comment in 
 EmbeddedSolrServer.java, the goal is to:
 * convert the response directly into a named list



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882958#comment-13882958
 ] 

Steve Rowe commented on SOLR-5652:
--

bq. It looks to me like there are two problems here: 1) the same doc is showing 
up on different pages when deep paging; and 2) missing docvalue docs are sorted 
incorrectly.

I think I understand problem #2: non-multi-valued numeric and string fields are 
created (by TrieField's and StrField's createFields() methods) as 
NumericDocValuesField-s and SortedDocValuesField-s, respectively, and these 
require each doc to have a value, which apparently defaults to zero for 
NumericDocValuesField-s and the empty string for SortedDocValueField-s.

Here are the declarations for the field types that have this problem in 
DistribCursorPagingTest (from schema-sorts.xml):

{code:xml}
fieldtype name=str_dv_last class=solr.StrField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=str_dv_first class=solr.StrField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=int_dv_last class=solr.TrieIntField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=int_dv_first class=solr.TrieIntField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=long_dv_last class=solr.TrieLongField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=long_dv_first class=solr.TrieLongField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=float_dv_last class=solr.TrieFloatField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=float_dv_first class=solr.TrieFloatField stored=true 
indexed=false docValues=true sortMissingFirst=true/

fieldtype name=double_dv_last class=solr.TrieDoubleField stored=true 
indexed=false docValues=true sortMissingLast=true/
fieldtype name=double_dv_first class=solr.TrieDoubleField stored=true 
indexed=false docValues=true sortMissingFirst=true/
{code}

I think that the above declarations should by disallowed by Solr, because they 
contain docValues=true + sortMissingLast|First=true; the user is asking 
for a particular sorting behavior for missing values, when there never will be 
missing values.

Also, the Solr Ref Guide 
[says|https://cwiki.apache.org/confluence/display/solr/DocValues] about 
docvalue fields If this type is used, the field must be either required or 
have a default value, meaning every document must have a value for this field. 
 However, neither the above field types nor the fields using them are required 
or have a default specified.  Maybe this should be enforced by schema parsing?

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5666) Using the hdfs write cache can result in appearance of corrupted index.


[ 
https://issues.apache.org/jira/browse/SOLR-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882963#comment-13882963
 ] 

ASF subversion and git services commented on SOLR-5666:
---

Commit 1561751 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1561751 ]

SOLR-5666: Using the hdfs write cache can result in appearance of corrupted 
index.

 Using the hdfs write cache can result in appearance of corrupted index.
 ---

 Key: SOLR-5666
 URL: https://issues.apache.org/jira/browse/SOLR-5666
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, 4.7






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5666) Using the hdfs write cache can result in appearance of corrupted index.

2014-01-27 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882964#comment-13882964
 ] 

ASF subversion and git services commented on SOLR-5666:
---

Commit 1561752 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1561752 ]

SOLR-5666: Using the hdfs write cache can result in appearance of corrupted 
index.

 Using the hdfs write cache can result in appearance of corrupted index.
 ---

 Key: SOLR-5666
 URL: https://issues.apache.org/jira/browse/SOLR-5666
 Project: Solr
  Issue Type: Bug
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0, 4.7






--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13882976#comment-13882976
 ] 

Yonik Seeley commented on SOLR-5652:


bq. NumericDocValuesField-s and SortedDocValuesField-s, respectively, and these 
require each doc to have a value, 

Although that used to be true, it should no longer be the case: LUCENE-5178

Now one thing that does look a little fishy to me that might cause a 
problem is how things like IntComparator deals with missing values...
it simply substitutes in MAX_INT or MIN_INT when the value is missing.

If the tests here are generating random values, you might try taking out 
MAX_numeric_type, MIN_numeric_type and see if it makes a difference?


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread David Webster (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883010#comment-13883010
]

David Webster commented on SOLR-4470:
-

I have to admit I find the content of this issue to be disturbing coming from
such a major Open Source project as Solr. I came here looking for a viable
security solution that did not involve segmenting off the system or otherwise
using IPsec and other IP-address centric forms of security. For most truly
Enterprise worthy solutions the products, themselves simply must address
security, internally, to ever be considered truly Enterprise worth solutions.
This product does not, and even worse, the core Dev team seems intent on NEVER
doing so!

As the lead Java architect for Distributed Systems Engineering at a fortune 100
company, security is my single most important concern. I don't care how fast a
product is, or how many slick features it has, if it isn't secure, at the core,
it is worthless as an Enterprise solution (at least for any Enterprise that
gives a whit about REAL security). Solr is doomed to use as a lab experiment
for any serious Enterprise implementation where security is more than an
afterthought.

I like Solr. I like what it does and how it does it. However, it's lack of
internal security hooks is a complete show stopper for use at my firm. So my
choices are to internalize the code, using this patch as our starting point,
and have our own Solr-like engine, or move on to something like ElasticSearch
which actually cares about real security at the node to node level.

Also, Mavenize the damned thing! Modern projects still use Ant? I haven't
opened a build.xml script in half a decade or more

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470.patch, SOLR-4470.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r145.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...


[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883023#comment-13883023
 ] 

Hoss Man commented on SOLR-5652:


bq. Although that used to be true, it should no longer be the case: LUCENE-5178

Right, see also: SOLR-5165  SOLR-5222

On IRC, i drew sarowe's attention to these issues and DocValuesMissingTest and 
he pointed out that DocValuesMissingTest uses the following...

bq. @SuppressCodecs({Lucene40, Lucene41, Lucene42}) // old formats cannot 
represent missing values

...so this may be the smoking gun to explain what's going wrong here, since we 
don't do anything like this in the cursor tests. (yet ... i'm going to fix that 
now)


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-4072) CharFilter that Unicode-normalizes input

2014-01-27 Thread David Goldfarb (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Goldfarb updated LUCENE-4072:
---

Attachment: 4072.patch

Attaching a new patch - testCuriousString still fails. 

You're right about readInputToBuffer. I think we also have to stop only on 
normalization boundaries. I see two options:
use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward)
or
use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset().

{noformat}
  private int readInputToBuffer() throws IOException {
final int len = input.read(tmpBuffer);
if (len == -1) {
  inputFinished = true;
  return 0;
}
inputBuffer.append(tmpBuffer, 0, len);
if (len = 2  normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
  }
{noformat}

 CharFilter that Unicode-normalizes input
 

 Key: LUCENE-4072
 URL: https://issues.apache.org/jira/browse/LUCENE-4072
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Ippei UKAI
 Attachments: 4072.patch, DebugCode.txt, LUCENE-4072.patch, 
 LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, 
 LUCENE-4072.patch, ippeiukai-ICUNormalizer2CharFilter-4752cad.zip


 I'd like to contribute a CharFilter that Unicode-normalizes input with ICU4J.
 The benefit of having this process as CharFilter is that tokenizer can work 
 on normalised text while offset-correction ensuring fast vector highlighter 
 and other offset-dependent features do not break.
 The implementation is available at following repository:
 https://github.com/ippeiukai/ICUNormalizer2CharFilter
 Unfortunately this is my unpaid side-project and cannot spend much time to 
 merge my work to Lucene to make appropriate patch. I'd appreciate it if 
 anyone could give it a go. I'm happy to relicense it to whatever that meets 
 your needs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-4072) CharFilter that Unicode-normalizes input

2014-01-27 Thread David Goldfarb (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4072?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883032#comment-13883032
 ] 

David Goldfarb edited comment on LUCENE-4072 at 1/27/14 6:10 PM:
-

Attaching a new patch - testCuriousString still fails. 

You're right about readInputToBuffer. I think we also have to stop only on 
normalization boundaries. I see two options:
use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward)
or
use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset().

{noformat}
  private int readInputToBuffer() throws IOException {
final int len = input.read(tmpBuffer);
if (len == -1) {
  inputFinished = true;
  return 0;
}
inputBuffer.append(tmpBuffer, 0, len);
if (len = 2  normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
  }
{noformat}

\[edit\]
And the len = 2 clause wasn't meant to be part of the patch, ignore that.
{noformat}
if (normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
{noformat}


was (Author: dgoldfarb):
Attaching a new patch - testCuriousString still fails. 

You're right about readInputToBuffer. I think we also have to stop only on 
normalization boundaries. I see two options:
use normalizer.hasBoundaryAfter(tmpBuffer\[len-1\]) (straightforward)
or
use normalizer.hasBoundaryBefore(tmpBuffer\[len-1\]) and use mark() and reset().

{noformat}
  private int readInputToBuffer() throws IOException {
final int len = input.read(tmpBuffer);
if (len == -1) {
  inputFinished = true;
  return 0;
}
inputBuffer.append(tmpBuffer, 0, len);
if (len = 2  normalizer.hasBoundaryAfter(tmpBuffer[len-1])  
!Character.isHighSurrogate(tmpBuffer[len-1])) {
return len;
} else return len + readInputToBuffer();
  }
{noformat}

 CharFilter that Unicode-normalizes input
 

 Key: LUCENE-4072
 URL: https://issues.apache.org/jira/browse/LUCENE-4072
 Project: Lucene - Core
  Issue Type: New Feature
  Components: modules/analysis
Reporter: Ippei UKAI
 Attachments: 4072.patch, DebugCode.txt, LUCENE-4072.patch, 
 LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, LUCENE-4072.patch, 
 LUCENE-4072.patch, ippeiukai-ICUNormalizer2CharFilter-4752cad.zip


 I'd like to contribute a CharFilter that Unicode-normalizes input with ICU4J.
 The benefit of having this process as CharFilter is that tokenizer can work 
 on normalised text while offset-correction ensuring fast vector highlighter 
 and other offset-dependent features do not break.
 The implementation is available at following repository:
 https://github.com/ippeiukai/ICUNormalizer2CharFilter
 Unfortunately this is my unpaid side-project and cannot spend much time to 
 merge my work to Lucene to make appropriate patch. I'd appreciate it if 
 anyone could give it a go. I'm happy to relicense it to whatever that meets 
 your needs.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...


[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883044#comment-13883044
 ] 

Hoss Man commented on SOLR-5652:


To clarify one thing: steve mentioned that it seems like there are two 
problems...

bq. It looks to me like there are two problems here: 1) the same doc is showing 
up on different pages when deep paging; and 2) missing docvalue docs are sorted 
incorrectly.

As far as #2 goes, now that we log every doc on every page, i can confirm that 
when i try some of these failed seeds (for example steves #129 log), i also see 
the incorrect ordering even though the test passes for me -- so #2 is almost 
certainly the codec issue.

that still leaves the question about #1, and what it isn't completely 
reproducible -- but that may just be an artifact of #2 (ie: if these codecs 
have non-deterministic behavior when trying to access missing values, there 
could be arbitrary data in a reused bytebuffer)

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...


[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883048#comment-13883048
 ] 

Hoss Man commented on SOLR-5652:


bq. Also, the Solr Ref Guide says about docvalue fields...

fixed.

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883053#comment-13883053
]

Shawn Heisey commented on SOLR-4470:

bq. This product does not, and even worse, the core Dev team seems intent on
NEVER doing so!

I don't know that we *never* intend on adding security. We face a major
problem with doing so at this time, though: We have absolutely no idea what
servlet container the user is going to use for running the solr war. The
example includes jetty, but aside from a few small edits in the stock config
file, it is unmodified. Solr has no control over the server-side HTTP layer
right now, so anything we try to do will almost certainly be wrong as soon as
the user changes containers or decides to modify their container config.

Solr 5.0 will not ship as a .war file. The work hasn't yet been done that will
turn it into an actual application, but it will be done before 5.0 gets
released. Once Solr is a real application that owns and fully controls the
HTTP layer, security will not be such a nightmare. You mention ElasticSearch
and its ability to deal with security. ES is already a standalone application,
which means they can do a lot of things that Solr currently can't. It's a
legitimate complaint with Solr, one that we are trying to rectify.

bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't
opened a build.xml script in half a decade or more

I can't say anything about maven vs. ant. I don't have enough experience with
either.

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470.patch, SOLR-4470.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r145.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread David Webster (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883085#comment-13883085
]

David Webster commented on SOLR-4470:
-

Thanks, for the update, Shawn. The move to a stand-alone implementation should
be a good one, with hope that a robust security implementation will be at the
very top of the priority list. Not sure what the timeline for that is, but
I've got a fairly short one for laying down the foundation of our Enterprise
Search by 3rd Qtr. That will have to pass IA muster (mainstream Solr does
not), which still leaves me in a bit of quandary as to how to proceed. I don't
want the added TOC of maintaining our own search engine, but cannot wait around
very long for viable solutions to surface, either. I'm either going to have to
implement this patch branch, or move on to other engine choices...

I know JBoss, JBPM specifically, used to be ant based but they've gone full
Maven now. This is the first big Open Source project I've run across in some
time that still uses Ant. Not many devs on our staff can still read a
build.xml file anymore...and those that can would rather not...

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470.patch, SOLR-4470.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r145.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5463) Provide cursor/token based searchAfter support that works with arbitrary sorting (ie: deep paging)


[ 
https://issues.apache.org/jira/browse/SOLR-5463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883114#comment-13883114
 ] 

Hoss Man commented on SOLR-5463:


bq. Some further thoughts: ...

Yonik: no disagreement from me, but since what we've got so far has already 
been committed and backported to 4x, i think it would make sense to track your 
enhancement ideas in new issues for tracking purposes (unless you think you can 
help bang these out before 4.7).


 Provide cursor/token based searchAfter support that works with arbitrary 
 sorting (ie: deep paging)
 --

 Key: SOLR-5463
 URL: https://issues.apache.org/jira/browse/SOLR-5463
 Project: Solr
  Issue Type: New Feature
Reporter: Hoss Man
Assignee: Hoss Man
 Fix For: 5.0, 4.7

 Attachments: SOLR-5463-randomized-faceting-test.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463.patch, SOLR-5463.patch, SOLR-5463.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man.patch, SOLR-5463__straw_man.patch, 
 SOLR-5463__straw_man__MissingStringLastComparatorSource.patch


 I'd like to revist a solution to the problem of deep paging in Solr, 
 leveraging an HTTP based API similar to how IndexSearcher.searchAfter works 
 at the lucene level: require the clients to provide back a token indicating 
 the sort values of the last document seen on the previous page.  This is 
 similar to the cursor model I've seen in several other REST APIs that 
 support pagnation over a large sets of results (notable the twitter API and 
 it's since_id param) except that we'll want something that works with 
 arbitrary multi-level sort critera that can be either ascending or descending.
 SOLR-1726 laid some initial ground work here and was commited quite a while 
 ago, but the key bit of argument parsing to leverage it was commented out due 
 to some problems (see comments in that issue).  It's also somewhat out of 
 date at this point: at the time it was commited, IndexSearcher only supported 
 searchAfter for simple scores, not arbitrary field sorts; and the params 
 added in SOLR-1726 suffer from this limitation as well.
 ---
 I think it would make sense to start fresh with a new issue with a focus on 
 ensuring that we have deep paging which:
 * supports arbitrary field sorts in addition to sorting by score
 * works in distributed mode
 {panel:title=Basic Usage}
 * send a request with {{sort=Xstart=0rows=NcursorMark=*}}
 ** sort can be anything, but must include the uniqueKey field (as a tie 
 breaker) 
 ** N can be any number you want per page
 ** start must be 0
 ** \* denotes you want to use a cursor starting at the beginning mark
 * parse the response body and extract the (String) {{nextCursorMark}} value
 * Replace the \* value in your initial request params with the 
 {{nextCursorMark}} value from the response in the subsequent request
 * repeat until the {{nextCursorMark}} value stops changing, or you have 
 collected as many docs as you need
 {panel}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5488) Fix up test failures for Analytics Component

2014-01-27 Thread Steven Bower (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883117#comment-13883117
 ] 

Steven Bower commented on SOLR-5488:


I finally got a linux box at home to repro this issue (well at least a similar 
one).. I think the issue in how it identifies individual components of a query 
so that they are not duplicated throughout the query execution.. i think its 
just associating the wrong stats collectors with query components.. i've 
narrowed it down to that but not quite sure exactly where this is or why it is 
so ephemeral..

 Fix up test failures for Analytics Component
 

 Key: SOLR-5488
 URL: https://issues.apache.org/jira/browse/SOLR-5488
 Project: Solr
  Issue Type: Bug
Affects Versions: 5.0, 4.7
Reporter: Erick Erickson
Assignee: Erick Erickson
 Attachments: SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, 
 SOLR-5488.patch, SOLR-5488.patch, SOLR-5488.patch, eoe.errors


 The analytics component has a few test failures, perhaps 
 environment-dependent. This is just to collect the test fixes in one place 
 for convenience when we merge back into 4.x



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: Adds logParamsList parameter to support reduced l...

2014-01-27 Thread cpoerschke

GitHub user cpoerschke opened a pull request:

https://github.com/apache/lucene-solr/pull/23

Adds logParamsList parameter to support reduced logging.

For https://issues.apache.org/jira/i#browse/SOLR-5672 add logParamsList 
parameter to support reduced logging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bloomberg/lucene-solr 
branch_4x-fewer-params-logged

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/23.patch


commit e6f82c935d5f8ee6b225be41b5a6615833fc3029
Author: Christine Poerschke cpoersc...@bloomberg.net
Date:   2014-01-24T13:17:44Z

Adds logParamsList parameter to support reduced logging.




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5672) add logParamsList parameter to support reduced logging

2014-01-27 Thread Christine Poerschke (JIRA)

Christine Poerschke created SOLR-5672:
-

 Summary: add logParamsList parameter to support reduced logging
 Key: SOLR-5672
 URL: https://issues.apache.org/jira/browse/SOLR-5672
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke


The use case we have is that logging full requests in each shard is just 'too 
much' but at the same time we wish to be able to tie together requests across 
shards. In certain circumstances we also wish to fully log some requests.




--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5672) add logParamsList parameter to support reduced logging

2014-01-27 Thread Christine Poerschke (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883146#comment-13883146
 ] 

Christine Poerschke commented on SOLR-5672:
---

The change https://github.com/apache/lucene-solr/pull/23 adds an new parameter.

If it is missing then behaviour will be as it is now. If it is supplied the 
following use cases are possible:
{code}
...logParamsList= # don't log any parameters
...logParamsList=q,fq # log only the q and fq parameters
{code}


 add logParamsList parameter to support reduced logging
 --

 Key: SOLR-5672
 URL: https://issues.apache.org/jira/browse/SOLR-5672
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke

 The use case we have is that logging full requests in each shard is just 'too 
 much' but at the same time we wish to be able to tie together requests across 
 shards. In certain circumstances we also wish to fully log some requests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-01-27 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883171#comment-13883171
 ] 

Shalin Shekhar Mangar commented on SOLR-5473:
-

Some comments on the latest patch:

# AbstractFullDistribZkTestBase has a useExternalCollection() which is hard 
coded to false. Why? Can we randomize using external collections in the base 
test to have better test coverage?
# ClusterState.getCollections has a todo which says “fix later JUnit is 
failing”. Which test is failing?
# What is _stateVer_ used for? I guess it is for SOLR-5474 and not this issue?
# This patch has only whitespace related changes to CloudSolrServer.
# There is wrong formatting and incorrect spacing in the new code such as 
Overseer.createCollection, new methods in ClusterState etc. You should 
re-format all the new/modified code blocks
# There was one forbidden-api check failure where new String(byte[]) 
constructor is used in a log message. Run ant check-forbidden-apis from inside 
the solr directory.
# There are three javadoc errors (run ant precommit):
{code}
[ecj-lint] 1. ERROR in 
/Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java
 (at line 199)
 [ecj-lint] /** @deprecated
 [ecj-lint]  ^^
 [ecj-lint] Javadoc: Description expected after @deprecated
 [ecj-lint] --
 [ecj-lint] 2. ERROR in 
/Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ClusterState.java
 (at line 297)
 [ecj-lint] * @deprecated
 [ecj-lint]^^
 [ecj-lint] Javadoc: Description expected after @deprecated
 [ecj-lint] --
 [ecj-lint] --
 [ecj-lint] 3. ERROR in 
/Users/shalinmangar/work/oss/solr-trunk/solr/solrj/src/java/org/apache/solr/common/cloud/ZkStateReader.java
 (at line 759)
 [ecj-lint] * @param coll
 [ecj-lint]  
 [ecj-lint] Javadoc: Description expected after this reference
 [ecj-lint] --
 [ecj-lint] 3 problems (3 errors)
{code}

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5672) add logParamsList parameter to support reduced logging


[ 
https://issues.apache.org/jira/browse/SOLR-5672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883169#comment-13883169
 ] 

Mark Miller commented on SOLR-5672:
---

+1

 add logParamsList parameter to support reduced logging
 --

 Key: SOLR-5672
 URL: https://issues.apache.org/jira/browse/SOLR-5672
 Project: Solr
  Issue Type: Improvement
Reporter: Christine Poerschke

 The use case we have is that logging full requests in each shard is just 'too 
 much' but at the same time we wish to be able to tie together requests across 
 shards. In certain circumstances we also wish to fully log some requests.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883195#comment-13883195
 ] 

Mark Miller commented on SOLR-5473:
---

I'm fairly busy in the short term - going out of town for a few days. But I 
intend to review this as well.

 Make one state.json per collection
 --

 Key: SOLR-5473
 URL: https://issues.apache.org/jira/browse/SOLR-5473
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
 SOLR-5473.patch, SOLR-5473.patch


 As defined in the parent issue, store the states of each collection under 
 /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

REMINDER: Call For Papers: ApacheCon North America 2014 -- ends Feb 1st

2014-01-27 Thread Chris Hostetter



(Note: cross posted, please keep any replies to general@lucene)

Quick reminder that the CFP for ApacheCon (Denver) ends on Saturday...

http://events.linuxfoundation.org/events/apachecon-north-america/program/cfp


Ladies and Gentlemen, start writing your proposals. The Call For Papers 
for ApacheCon North America 2014 is now open, and is open until February 
1st, 2014. Note that we are on a very short timeline this year, so don't 
assume that we'll extend the CFP, just because we've done so every time 
before.




-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5670) _version_ either indexed OR docvalue


[ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883288#comment-13883288
 ] 

Shawn Heisey commented on SOLR-5670:


Reducing heap requirements by not requiring data to go into the FieldCache is a 
major win for huge indexes.  GC can be a major source of performance issues 
even if you've got garbage collection superbly tuned, and I doubt that my 
tuning parameters are perfect.


 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883290#comment-13883290
]

Per Steffensen commented on SOLR-4470:
--

bq. This product does not, and even worse, the core Dev team seems intent on
NEVER doing so!

At least most of them, yes. It is really a shame.

bq. As the lead Java architect for Distributed Systems Engineering at a fortune
100 company, security is my single most important concern

As the tech lead on the largest REAL SolrCloud installation on the planet, I
agree :-) I believe I can say that we have the largest installation in the
world for two reasons
* Upgrading from one version of SolrCloud to the next is not something that
seem to be very important in this product. At least it is hard to do, and there
seem to be no testing of it when a new release 4.y comes out - no testing that
you can actually upgrade to it from 4.x. This makes me believe that no-one or
at least only a few, have so big installations that just installing 4.y and
store/index all data from the old 4.x installation from scratch is not an
option. If others actually had to do upgrades where this is not possible, lots
of complaints would pop up - and they dont
* Our biggest system stores and indexes 1-2 billion documents per day, and have
2 years of history. That is about 1000 billion documents in Solr at any time
with 1-2 billion going in every day (and 30-60 billion going out every month).
To be able to run such a system we needed to do numerous optimizations, and in
general without optimizations you will never get such a big system working. I
do not see much talk around here about optimizations of that kind - probably
because people have not run into the problems yet.

bq. I like Solr. I like what it does and how it does it.

Me too. On that part it actually has numerous advantages over e.g.
ElasticSearch. We used ES to begin with, and we liked it, but for political
reasons we where not allowed to keep using it, and we turned to find an
alternative. At that point in time SolrCloud (4.x) where only in its startup
phase (a year before 4.0 was released), but we believed so much in the idea
behind, that we decided to go for it.

bq. However, it's lack of internal security hooks is a complete show stopper
for use at my firm

For us, too. That is why we made our own fix to it - provided as a patch here
and also available at https://github.com/steff1193/lucene-solr

bq. Using this patch as our starting point

I am happy to hear that. Please feel free to contact me if you have any
problems making it work or understanding what it does. I might also be able to
provide a few tips on making it extra secure :-)

bq. and have our own Solr-like engine

We made the same decision years ago. We have had our own version of Solr in our
own VCS for years. Just recently I put the code on
https://github.com/steff1193/lucene-solr. No releases (incl maven artifacts)
yet. But that will come soon. Until then you will have to build it yourself
from source.

bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't
opened a build.xml script in half a decade or more

Already done.
{code}
ant [-Dversion=$VERSION] get-maven-poms
{code}
Will build the maven structure in folder maven-build
E.g. if you use Eclipse
{code}
ant eclipse
{code}
In Eclipse right-click the root-folder, chose Import... and Existing Maven
Project. Import all Maven pom.xmls from maven-build folder

bq. We have absolutely no idea what servlet container the user is going to use
for running the solr war.

It isnt important for this issue. Protecting the HTTP endpoints with
authentication and authorization is standardized in the servlet-spec. All
web-containers have to live up to that standard (to be certified). Only place
where the standardization is not very clear is how to install a realm (the
thingy knowing about user-credentials and roles), but all containers have
plenty of documentation on how to do it.

It is very important to understand that this issue, and the patch I provided
will work for any web-container. This issue is not about enforcing the
protection - let the web-container do that. This issue and the patch is ONLY
about enabling Solr to send credentials in its Solr-node-to-Solr-node requests,
so that things will keep working, if/when you make the obvious security
decision and make usage of the security-features provided to you for free by
the container.

bq. Solr has no control over the server-side HTTP layer right now, so anything
we try to do will almost certainly be wrong as soon as the user changes
containers or decides to modify their container config.

NO!

bq. Solr 5.0 will not ship as a .war file

Bad idea. This is one of the points where Solr did a better decision that ES

bq. Once Solr is a real application that owns and fully controls the HTTP
layer,

[jira] [Comment Edited] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883290#comment-13883290
]

Per Steffensen edited comment on SOLR-4470 at 1/27/14 9:09 PM:
---

bq. This product does not, and even worse, the core Dev team seems intent on
NEVER doing so!

At least most of them, yes. It is really a shame.

bq. As the lead Java architect for Distributed Systems Engineering at a fortune
100 company, security is my single most important concern

bq. I like Solr. I like what it does and how it does it.

bq. However, it's lack of internal security hooks is a complete show stopper
for use at my firm

For us, too. That is why we made our own fix to it - provided as a patch here
and also available at https://github.com/steff1193/lucene-solr

bq. Using this patch as our starting point

I am happy to hear that. Please feel free to contact me if you have any
problems making it work or understanding what it does. I might also be able to
provide a few tips on making it extra secure :-)

bq. and have our own Solr-like engine

bq. Also, Mavenize the damned thing! Modern projects still use Ant? I haven't
opened a build.xml script in half a decade or more

bq. We have absolutely no idea what servlet container the user is going to use
for running the solr war.

NO!

bq. Solr 5.0 will not ship as a .war file

Bad idea. This is one of the points where Solr did a better decision that ES

bq. Once Solr is a real application

[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...


 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5652:
---

Attachment: SOLR-5652.codec.skip.dv.patch

rather then just use SupressCodec in this test, here's a patch that checks to 
see if the codec supports docvalues with sort missing, and if not then it skips 
those fields -- but the other fields are still checked.

you can see it working by comparing the logs messages (showing the fields 
tested) between things like...

{noformat}
ant test -Dtestcase=DistribCursorPagingTest -Dtests.codec=Lucene40
   vs
ant test -Dtestcase=DistribCursorPagingTest -Dtests.codec=Lucene45
{noformat}

Before i commit this though, i really want to add an explicit sanity checking 
that the docs are in the expected order so we can see a definitive and 
consistent fail from the problem this tries to prevent ... i'm going to work on 
that this afternoon.

(I also want to docValue fields to the test schema that don't use either 
sortMissingLast _or_ sortMissingFirst, and just rely on the default behavior 
... not sure why i didn't think to include that in the first place)


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5671) Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than expected


 [ 
https://issues.apache.org/jira/browse/SOLR-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated SOLR-5671:
-

Description: 
Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
number of indexed docs and retrieved one fewer doc than the number of indexed 
docs.  Both of these failures were on trunk on Windows:

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  

I've also seen this twice on trunk on my OS X laptop (out of 875 trials).

None of the seeds have reproduced for me.

All the failures were using either Lucene41 or Lucene42 codec

  was:
Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
number of indexed docs and retrieved one fewer doc than the number of indexed 
docs.  Both of these failures were on trunk on Windows:

http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  

I've also seen this twice on trunk on my OS X laptop (out of 875 trials).

None of the seeds have reproduced for me.


 Heisenbug #2 in DistribCursorPagingTest: full walk returns one fewer doc than 
 expected 
 ---

 Key: SOLR-5671
 URL: https://issues.apache.org/jira/browse/SOLR-5671
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.7
Reporter: Steve Rowe

 Twice on Uwe's Jenkins, DistribCursorPagingTest has paged through a small 
 number of indexed docs and retrieved one fewer doc than the number of indexed 
 docs.  Both of these failures were on trunk on Windows:
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3708/
 http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Windows/3713/  
 I've also seen this twice on trunk on my OS X laptop (out of 875 trials).
 None of the seeds have reproduced for me.
 All the failures were using either Lucene41 or Lucene42 codec



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...

2014-01-27 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883317#comment-13883317
 ] 

Robert Muir commented on SOLR-5652:
---

{quote}
On IRC, i drew sarowe's attention to these issues and DocValuesMissingTest and 
he pointed out that DocValuesMissingTest uses the following...

@SuppressCodecs({Lucene40, Lucene41, Lucene42}) // old formats cannot 
represent missing values

...so this may be the smoking gun to explain what's going wrong here, since we 
don't do anything like this in the cursor tests. (yet ... i'm going to fix that 
now)
{quote}

Dammit, I feel pretty terrible. You guys have been debugging this thing for a 
long time, and I've been trying to stay up to date on the issue, but not once 
did I even think about this...



 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5670) _version_ either indexed OR docvalue

2014-01-27 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-5670.


   Resolution: Fixed
Fix Version/s: 4.7
   5.0

Committed.  Thanks!

 _version_ either indexed OR docvalue
 

 Key: SOLR-5670
 URL: https://issues.apache.org/jira/browse/SOLR-5670
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 4.7
Reporter: Per Steffensen
Assignee: Per Steffensen
  Labels: solr, solrcloud, version
 Fix For: 5.0, 4.7

 Attachments: SOLR-5670.patch, SOLR-5670.patch


 As far as I can see there is no good reason to require that _version_ field 
 has to be indexed if it is docvalued. So I guess it will be ok with a rule 
 saying _version_ has to be either indexed or docvalue (allowed to be both).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...


[ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883402#comment-13883402
 ] 

Steve Rowe commented on SOLR-5652:
--

bq. rather then just use SupressCodec in this test, here's a patch that checks 
to see if the codec supports docvalues with sort missing, and if not then it 
skips those fields – but the other fields are still checked.

+1, looks good, though on trunk Lucene3x and Appending can be removed from 
the blacklist in LTC.defaultCodecSupportsMissingDocValues().  I see these 
elsewhere on trunk (Solr tests only), though, so maybe they're not just 
vestiges?

 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883426#comment-13883426
]

Mark Miller commented on SOLR-4470:
---

The bulk of this patch was not that contentious. The rest seemed to mostly be
hashed out. The missing piece has been a committer with the skill and time to
put it in, take responsibility for it, and support it.

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470.patch, SOLR-4470.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r145.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5376) Add a demo search server

2014-01-27 Thread Arcadius Ahouansou (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883452#comment-13883452
 ] 

Arcadius Ahouansou commented on LUCENE-5376:


Hello.

I have checked out this branch and did in the lucene directory an 
ant clean package-zip
The build was successful and many artefacts were created including:

- lucene-xml-query-demo.war
- lucene-demo-5.0-SNAPSHOT.jar
- lucene-server-5.0-SNAPSHOT.jar

I dropped the war into a fresh jetty 9 install and jetty was not happy (see 
stacktrace below).

My questions is:
- How the demo and the new server package fit together?
- How to run the demo?

Thanks.

{code}
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at org.eclipse.jetty.start.Main.invokeMain(Main.java:297)
at org.eclipse.jetty.start.Main.start(Main.java:724)
at org.eclipse.jetty.start.Main.main(Main.java:103)
2014-01-27 22:21:36.288:WARN:lucene-xml-query-demo:main: unavailable
javax.servlet.UnavailableException: 
org.apache.lucene.xmlparser.webdemo.FormBasedXmlQueryDemo
at org.eclipse.jetty.servlet.BaseHolder.doStart(BaseHolder.java:102)
at 
org.eclipse.jetty.servlet.ServletHolder.doStart(ServletHolder.java:294)
{code}

 Add a demo search server
 

 Key: LUCENE-5376
 URL: https://issues.apache.org/jira/browse/LUCENE-5376
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: lucene-demo-server.tgz


 I think it'd be useful to have a demo search server for Lucene.
 Rather than being fully featured, like Solr, it would be minimal, just 
 wrapping the existing Lucene modules to show how you can make use of these 
 features in a server setting.
 The purpose is to demonstrate how one can build a minimal search server on 
 top of APIs like SearchManager, SearcherLifetimeManager, etc.
 This is also useful for finding rough edges / issues in Lucene's APIs that 
 make building a server unnecessarily hard.
 I don't think it should have back compatibility promises (except Lucene's 
 index back compatibility), so it's free to improve as Lucene's APIs change.
 As a starting point, I'll post what I built for the eating your own dog 
 food search app for Lucene's  Solr's jira issues 
 http://jirasearch.mikemccandless.com (blog: 
 http://blog.mikemccandless.com/2013/05/eating-dog-food-with-lucene.html ). It 
 uses Netty to expose basic indexing  searching APIs via JSON, but it's very 
 rough (lots nocommits).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

maven build issues with non-numeric custom version

2014-01-27 Thread Ryan McKinley

From:
http://svn.apache.org/repos/asf/lucene/dev/branches/branch_4x/dev-tools/maven/README.maven

It says we can get a custom build number using:

ant -Dversion=my-special-version get-maven-poms


but this fails with:

BUILD FAILED

/Users/ryan/workspace/apache/lucene_4x/build.xml:141: The following error
occurred while executing this line:

/Users/ryan/workspace/apache/lucene_4x/lucene/common-build.xml:1578: The
following error occurred while executing this line:

/Users/ryan/workspace/apache/lucene_4x/lucene/tools/custom-tasks.xml:122:
Malformed module dependency from
'lucene-analyzers-phonetic.internal.test.dependencies':
'lucene/build/analysis/common/lucene-analyzers-common-my-special-version.jar'



Using a numeric version number things work OK.


Any ideas?


ryan

[jira] [Commented] (SOLR-4470) Support for basic http auth in internal solr requests

2014-01-27 Thread JIRA

[
https://issues.apache.org/jira/browse/SOLR-4470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883459#comment-13883459
]

Jan Høydahl commented on SOLR-4470:
---

I started the port to trunk along with some other changes last summer, but did
not get to finalize it within the time available at that time. I also realized
I need some help moving along as I'm quite novice on servlet security.

Implementing this patch for 5.0 and 4.x would still be worth the effort, should
we choose to replace the container with Netty or something else, since most of
the internal inter-node communication will stay the same - is that correct?

When I dived into this last time around the intent was to commit a working impl
to trunk first, let it bake for a few weeks (perhaps with the test framework
randomizin security on/off) and then backport. This is best practice for big
changes, and this patch is HUGE. So here is one committer willing to
contribute, but I need some help from someone willing to take a look at
https://github.com/cominvent/lucene-solr/tree/SOLR-4470 and finding out out
what 1% is missing for it to work, and then get it up to date with current
trunk...

Support for basic http auth in internal solr requests
-

Attachments: SOLR-4470.patch, SOLR-4470.patch,
SOLR-4470_branch_4x_r1452629.patch, SOLR-4470_branch_4x_r1452629.patch,
SOLR-4470_branch_4x_r145.patch

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

lucene-solr pull request: Lucene 5092 pull 1

2014-01-27 Thread PaulElschot

GitHub user PaulElschot opened a pull request:

https://github.com/apache/lucene-solr/pull/24

Lucene 5092 pull 1

DocBlocksIterator extends DocIdSetIterator.
FixedBitSetDBI and EliasFanoDocIdSet implement DocBlocksIterator.
The join module ToParent/ToChild queries use DocBlocksIterator instead of 
FixedBitSet.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PaulElschot/lucene-solr LUCENE-5092-pull-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/lucene-solr/pull/24.patch


commit 0b4c85b1b30426f34f65a03c32bb2618e1d03f99
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T19:31:14Z

Ignore *.*~ and *.jar files

commit 9a3c80013219b986340cd5a470fb30d20d35504a
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T20:35:54Z

Add first version of DocBlockIterator

commit 77341eed771facde8cf89bc85c99fe0ccd6bd257
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T20:53:00Z

OpenBitSetIterator extends DocBlockIterator, advanceToJustBefore() not yet 
implemented.

commit d920b8e6f2fbf39da42a5eff19301c4ca92647c6
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T21:46:48Z

Initial implementation of OpenBitSetIterator.advanceToJustBefore()

commit ebff7763d31518989882909da56e0b9be22a4f89
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T21:57:38Z

The OpenBitSetIterator constructor not using an OpenBitSet can not easily 
be deleted

commit 4166b0e4fa44b10f7c25158a811ff8593d540957
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-19T22:16:30Z

More detailed plan

commit 807f98db323ee78454d6bb7d76a9d40d89e8126b
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T19:11:17Z

Rename to DocBlocksIterator

commit 7ea28b0443e62d4e02458943a06cd97a9c8ad843
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T19:17:09Z

Rename to class DocBlocksIterator

commit 42e4bbc18769f7f91a6dfd730cc5d7d51582cb6c
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T19:52:21Z

Adapted ToParentBlockJoinQuery to use DocBlocksIterator directly from FBS, 
tests pass

commit 3d7819bc9e3b8754e6f882e60a0920800ba09954
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T21:19:53Z

Remove some commented code

commit 4b2a7a4a529810dbf742958463c3f9327444f3b1
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-20T22:26:27Z

Getting closer with ToChildBJQ

commit 24032392ede9b8b2997152f4f6aec3af03a6e550
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T15:16:21Z

Merge branch 'trunk' into docblocksiter

commit 8fde265979ba8913045a3f9cd87a15482739cc43
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T16:49:48Z

Always set OpenBitSet attribute in OpenBitSetIterator

commit b7627dd4f41aff421af6d9a0781fcc13fe668995
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T16:51:06Z

Added a test for advanceToJustBefore in BaseDocIdSetTestCase, 
TestFixedBitSet fails

commit f1966ae5b4f375c7451ff083288e409a0b41b9ef
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-21T21:14:11Z

Previous test seed passes, next one fails

commit c198cd8b6b06187c65477f088dad918974721099
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-22T23:49:52Z

Added OpenBitSetDocBlocksIterator

commit c29094ceba3bec8773e51c17fe3c80abab5ae526
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-22T23:53:00Z

Merge branch 'trunk' of https://github.com/apache/lucene-solr into 
docblocksiter

commit 7f7d8901bb396b82a0e874ca1f3c4264806fcd8e
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T20:37:49Z

Improve ignoring lib directories

commit e8abc6f30060ac10de886b6fcc225d561e4758b5
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T21:20:13Z

Added FixedBitSetDBI, tests pass.
FixedBitSet.java from trunk, made some private things protected.

commit f78dca9bdf2b79fe3fbb7b80898fb88420891418
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T21:30:05Z

Remove some unused imports

commit 273a7e80767252f9748878878b0e9d742d2df669
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T21:33:17Z

Remove commented println lines

commit 3f93aa8d76422844d141fc2070a236e780e577f8
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:24:09Z

Add TestDocIdSetBenchMark.java. Note: no APL 2.0

commit 3ca778ffee79cc9bd549e4b0dd37e00f16ba6320
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:26:06Z

Add assert message

commit 50f0175fda3637b88e982f285021921c69fe4dff
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:26:22Z

Correct comment

commit d07201d00dada7d3c4bde33471dac3accdb9b1e8
Author: Paul Elschot paul.j.elsc...@gmail.com
Date:   2014-01-23T23:26:52Z

Remove

[jira] [Commented] (LUCENE-5092) join: don't expect all filters to be FixedBitSet instances

2014-01-27 Thread Paul Elschot (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883518#comment-13883518
]

Paul Elschot commented on LUCENE-5092:
--

I have opened this pull request:
https://github.com/apache/lucene-solr/pull/24
In case a patch is preferred, please let me know.

In the pull request:
DocBlocksIterator extends DocIdSetIterator.
FixedBitSetDBI and EliasFanoDocIdSet implement DocBlocksIterator, so
EliasFanoDocIdSet could also be used for joins.
The join module ToParent/ToChild queries use DocBlocksIterator instead of
FixedBitSet.
In the join module, FixedBitSetCachingWrapperFilter.java is replaced by
DocBlocksCachingWrapperFilter which uses FixedBitSetDBI for now.

LUCENE-5416 is open for FixedBitSetDBI.

join: don't expect all filters to be FixedBitSet instances
--

Key: LUCENE-5092
URL: https://issues.apache.org/jira/browse/LUCENE-5092
Project: Lucene - Core
Issue Type: Improvement
Components: modules/join
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
Attachments: LUCENE-5092.patch

The join module throws exceptions when the parents filter isn't a
FixedBitSet. The reason is that the join module relies on prevSetBit to find
the first child document given a parent ID.
As suggested by Uwe and Paul Elschot on LUCENE-5081, we could fix it by
exposing methods in the iterators to iterate backwards. When the join modules
gets an iterator which isn't able to iterate backwards, it would just need to
dump its content into another DocIdSet that supports backward iteration,
FixedBitSet for example.

--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2014-01-27 Thread Vassil Velichkov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13883528#comment-13883528
 ] 

Vassil Velichkov commented on SOLR-2242:


I really hope that this issue will be resolved in SOLR 4.7...Fingers crossed :-)

 Get distinct count of names for a facet field
 -

 Key: SOLR-2242
 URL: https://issues.apache.org/jira/browse/SOLR-2242
 Project: Solr
  Issue Type: New Feature
  Components: Response Writers
Affects Versions: 4.0-ALPHA
Reporter: Bill Bell
Priority: Minor
 Fix For: 4.7

 Attachments: SOLR-2242-3x.patch, SOLR-2242-3x_5_tests.patch, 
 SOLR-2242-solr40-3.patch, SOLR-2242.patch, SOLR-2242.patch, SOLR-2242.patch, 
 SOLR-2242.shard.withtests.patch, SOLR-2242.solr3.1-fix.patch, 
 SOLR-2242.solr3.1.patch, SOLR.2242.solr3.1.patch


 When returning facet.field=name of field you will get a list of matches for 
 distinct values. This is normal behavior. This patch tells you how many 
 distinct values you have (# of rows). Use with limit=-1 and mincount=1.
 The feature is called namedistinct. Here is an example:
 Parameters:
 facet.numTerms or f.field.facet.numTerms = true (default is false) - turn 
 on distinct counting of terms
 facet.field - the field to count the terms
 It creates a new section in the facet section...
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=falsefacet.limit=-1facet.field=price
 http://localhost:8983/solr/select?shards=localhost:8983/solr,localhost:7574/solrindent=trueq=*:*facet=truefacet.mincount=1facet.numTerms=truefacet.limit=-1facet.field=price
 This currently only works on facet.field.
 {code}
 lst name=facet_counts
 lst name=facet_queries/
 lst name=facet_fields.../lst
 lst name=facet_numTerms
 lst name=localhost:8983/solr/
 int name=price14/int
 /lst
 lst name=localhost:8080/solr/
 int name=price14/int
 /lst
 /lst
 lst name=facet_dates/
 lst name=facet_ranges/
 /lst
 OR with no sharding-
 lst name=facet_numTerms
 int name=price14/int
 /lst
 {code} 
 Several people use this to get the group.field count (the # of groups).



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...


 [ 
https://issues.apache.org/jira/browse/SOLR-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hoss Man updated SOLR-5652:
---

Attachment: SOLR-5652.nocommit.patch

Ok, this new patch has the following...
* new {{\*_dv}} fields in the schema for all the various types w/o using any of 
the sort missing options
* tweaked the simple testing in both the single node and distrib test so that:
** one doc is missing an int value
** we randomly pick either int or int_dv as a field to use in explicit sorts
*** currently a nocommit in place to force this to be int_dv
** we explicitly sort on all 3 missing sub-variants (, _first, _last) 
and check the doc order exactly matches our expectations
* includes everything from SOLR-5652.codec.skip.dv.patch...
** ...but there is a nocommit bypassing hte codec check so docvalues are always 
used.

With this patch, and these nocommits, it's pretty trivial to reliably reproduce 
failing seeds that pop up when running...

{code}
ant test  -Dtests.class=\*Cursor\* -Dtests.codec=Lucene40
{code}

...and likewise, my limted testing so far hasn't seen any failures when running 
this patch with Lucene45 codec...

{code}
ant test  -Dtests.class=\*Cursor\* -Dtests.codec=Lucene45
{code}


 Heisenbug in DistribCursorPagingTest: walk already seen ...
 -

 Key: SOLR-5652
 URL: https://issues.apache.org/jira/browse/SOLR-5652
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Hoss Man
 Attachments: 129.log, 372.log, SOLR-5652.codec.skip.dv.patch, 
 SOLR-5652.nocommit.patch, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1200.log.txt, 
 jenkins.thetaphi.de_Lucene-Solr-4.x-MacOSX_1217.log.txt


 Several times now, Uwe's jenkins has encountered a walk already seen ... 
 assertion failure from DistribCursorPagingTest that I've been unable to 
 fathom, let alone reproduce (although sarowe was able to trigger a similar, 
 non-reproducible seed, failure on his machine)
 Using this as a tracking issue to try and make sense of it.
 Summary of things noticed so far:
 * So far only seen on http://jenkins.thetaphi.de  sarowe's mac
 * So far seen on MacOSX and Linux
 * So far seen on branch 4x and trunk
 * So far seen on Java6, Java7, and Java8
 * fails occured in first block of randomized testing: 
 ** we've indexed a small number of randomized docs
 ** we're explicitly looping over every field and sorting in both directions
 * fails were sorting on one of the \*_dv_last or \*_dv_first fields 
 (docValues=true, either sortMissingLast=true OR sortMissingFirst=true) 
 ** for desc sorts, sort on same field asc has worked fine just before this 
 (fields are in arbitrary order, but asc always tried before desc)
 ** sorting on some other random fields has sometimes been tried before this 
 and worked
 (specifics of each failure seen in the wild recorded in comments)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5652) Heisenbug in DistribCursorPagingTest: walk already seen ...