date:20130930

 [ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5248:
---

Attachment: LUCENE-5248.patch

Patch with testTonsOfUpdates: it's a real nasty test with which I hit OOM when 
running w/ -Dtests.nightly=true and -Dtests.multiplier=3. It first adds many 
documents (few 10Ks, w/ these params 250K) with several update terms (so that 
each term affects many docs) and few NDV fields. It then applies many numeric 
updates (with these params 20K), but sets IW's ram buffer to 512 bytes, so we 
get many flushes.

Because currently the resolved updates are held in RALD, and since the test 
doesn't invoke any merge while applying the updates, they just keep 
accumulating there, until RAM is exhausted. I should say that even when running 
the test with less docs, update terms and updates, I saw memory keeps on 
growing, but it wasn't enough to hit the heap space limit. But perhaps it means 
we can use RamUsageEstimator to assert that IW RAM consumption doesn't 
continuously increase, so that we catch this even if an OOM isn't hit?

I plan to handle it in two steps:

# Stop buffering updates in RALD except the isMerging case, but still use the 
Map>. Then we should see IW's sizeOf remains somewhat 
stable. Also, the test shouldn't OOM.
# Optimize the temporary spike in RAM (and buffering for isMerging) by trying 
Map (a Map on primitives, no compression but no object 
allocations) and Map (with compression, but more 
complicated code).

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5248.patch
>
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2899) Add OpenNLP Analysis capabilities as a module

2013-09-30 Thread rashi gandhi (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782654#comment-13782654
 ] 

rashi gandhi commented on LUCENE-2899:
--

Hi, 
I have applied this patch successfully on SOLR latest branch 4.x. But now I am 
not getting how to perform contextual searches on the data I have. I need to 
perform search on text field using some NLP process. I am new to NLP so need 
some help on how do I proceed further. How to train model using this integrated 
solr ? Do I need to study some thing else before moving ahead with this ?

I designed a analyzer and tried indexing data. But the results are weird and 
inconsistent. Kindly provide some pointers to move ahead 

Thanks in advance.

> Add OpenNLP Analysis capabilities as a module
> -
>
> Key: LUCENE-2899
> URL: https://issues.apache.org/jira/browse/LUCENE-2899
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: modules/analysis
>Reporter: Grant Ingersoll
>Assignee: Grant Ingersoll
>Priority: Minor
> Fix For: 5.0, 4.5
>
> Attachments: LUCENE-2899-current.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, LUCENE-2899.patch, 
> LUCENE-2899.patch, LUCENE-2899-RJN.patch, LUCENE-2899-x.patch, 
> LUCENE-2899-x.patch, LUCENE-2899-x.patch, OpenNLPFilter.java, 
> OpenNLPFilter.java, OpenNLPTokenizer.java, opennlp_trunk.patch
>
>
> Now that OpenNLP is an ASF project and has a nice license, it would be nice 
> to have a submodule (under analysis) that exposed capabilities for it. Drew 
> Farris, Tom Morton and I have code that does:
> * Sentence Detection as a Tokenizer (could also be a TokenFilter, although it 
> would have to change slightly to buffer tokens)
> * NamedEntity recognition as a TokenFilter
> We are also planning a Tokenizer/TokenFilter that can put parts of speech as 
> either payloads (PartOfSpeechAttribute?) on a token or at the same position.
> I'd propose it go under:
> modules/analysis/opennlp



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4032) Files larger than an internal buffer size fail to replicate


 [ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-4032:


Affects Version/s: (was: 5.0)
   4.0
Fix Version/s: 4.1

> Files larger than an internal buffer size fail to replicate
> ---
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-4032) Files larger than an internal buffer size fail to replicate


 [ 
https://issues.apache.org/jira/browse/SOLR-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar closed SOLR-4032.
---


The fix was released in 4.1

> Files larger than an internal buffer size fail to replicate
> ---
>
> Key: SOLR-4032
> URL: https://issues.apache.org/jira/browse/SOLR-4032
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
>Priority: Blocker
> Fix For: 4.1, 5.0
>
> Attachments: SOLR-4032.patch
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/trunk-is-unable-to-replicate-between-nodes-Unable-to-download-completely-td4017049.html
>  and 
> http://lucene.472066.n3.nabble.com/Possible-memory-leak-in-recovery-td4017833.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-4033) No lockType configured for NRTCachingDirectory


 [ 
https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar closed SOLR-4033.
---


This fix was released in 4.1

> No lockType configured for NRTCachingDirectory
> --
>
> Key: SOLR-4033
> URL: https://issues.apache.org/jira/browse/SOLR-4033
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4033) No lockType configured for NRTCachingDirectory


 [ 
https://issues.apache.org/jira/browse/SOLR-4033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-4033:


Affects Version/s: (was: 5.0)
   4.0
Fix Version/s: 4.1

> No lockType configured for NRTCachingDirectory
> --
>
> Key: SOLR-4033
> URL: https://issues.apache.org/jira/browse/SOLR-4033
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0
> Environment: 5.0-SNAPSHOT 1366361:1404534M - markus - 2012-11-01 
> 12:37:38
> Debian Squeeze, Tomcat 6, Sun Java 6, 10 nodes, 10 shards, rep. factor 2.
>Reporter: Markus Jelsma
>Assignee: Mark Miller
> Fix For: 4.1, 5.0
>
>
> Please see: 
> http://lucene.472066.n3.nabble.com/No-lockType-configured-for-NRTCachingDirectory-td4017235.html



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5292) Add SKIP_MUTATE_SINGLETON to FieldValueMutatingUpdateProcessor


 [ 
https://issues.apache.org/jira/browse/SOLR-5292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5292:
---

Attachment: SOLR-5292.patch

Patch implementing SKIP_MUTATE_SINGLETON.  I will run tests.  I'm not sure how 
to test the new functionality - there was no test that I could find for 
DELETE_VALUE_SINGLETON, which served as the inspiration for this.

> Add SKIP_MUTATE_SINGLETON to FieldValueMutatingUpdateProcessor
> --
>
> Key: SOLR-5292
> URL: https://issues.apache.org/jira/browse/SOLR-5292
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.5
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Minor
> Fix For: 5.0, 4.6
>
> Attachments: SOLR-5292.patch
>
>
> While trying to solve a problem, I came up with something that didn't work 
> for what I was trying to do, but seemed like it might be useful for somebody. 
>  This allows somebody to decide during the course of a mutateValue method 
> that they'd rather skip the mutate entirely - leave the source field value 
> completely unchanged.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5292) Add SKIP_MUTATE_SINGLETON to FieldValueMutatingUpdateProcessor

Shawn Heisey created SOLR-5292:
--

 Summary: Add SKIP_MUTATE_SINGLETON to 
FieldValueMutatingUpdateProcessor
 Key: SOLR-5292
 URL: https://issues.apache.org/jira/browse/SOLR-5292
 Project: Solr
  Issue Type: Improvement
Affects Versions: 4.5
Reporter: Shawn Heisey
Assignee: Shawn Heisey
Priority: Minor
 Fix For: 5.0, 4.6


While trying to solve a problem, I came up with something that didn't work for 
what I was trying to do, but seemed like it might be useful for somebody.  This 
allows somebody to decide during the course of a mutateValue method that they'd 
rather skip the mutate entirely - leave the source field value completely 
unchanged.




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-4626) TestCoreContainer Fail: AlreadyClosedException: this IndexWriter is closed


 [ 
https://issues.apache.org/jira/browse/SOLR-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-4626.
-

   Resolution: Fixed
Fix Version/s: (was: 4.5)
   4.2.1
   4.3

Fixed as part of SOLR-4638 in Solr 4.2.1 and 4.3

> TestCoreContainer Fail: AlreadyClosedException: this IndexWriter is closed
> --
>
> Key: SOLR-4626
> URL: https://issues.apache.org/jira/browse/SOLR-4626
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.0, 4.3, 4.2.1
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-4626) TestCoreContainer Fail: AlreadyClosedException: this IndexWriter is closed


 [ 
https://issues.apache.org/jira/browse/SOLR-4626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar closed SOLR-4626.
---


> TestCoreContainer Fail: AlreadyClosedException: this IndexWriter is closed
> --
>
> Key: SOLR-4626
> URL: https://issues.apache.org/jira/browse/SOLR-4626
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 4.2.1, 4.3, 5.0
>
>




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782506#comment-13782506
 ] 

Robert Muir commented on SOLR-5277:
---

As far as the query shit, i have no idea if solrdispatchfilter or whatever 
could/should do Thread.currentThread().setName(x) or whatever (and maybe 
restore after).

I'm just saying there is the possibility to keep everything simple and be more 
consistent.

> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782498#comment-13782498
 ] 

Shawn Heisey commented on SOLR-5277:


If there's anywhere I should look (code, docs, or both) for an overview of 
thread management in Solr, please let me know. I'm updating from my phone on 
the train, but I will be on IRC later this evening, mountain US timezone.

> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782497#comment-13782497
 ] 

Robert Muir commented on SOLR-5277:
---

I searched with eclipse for "commitScheduler" and found it here.

So here is an untested example of what i mean:

{noformat}
Index: solr/core/src/java/org/apache/solr/update/CommitTracker.java
===
--- solr/core/src/java/org/apache/solr/update/CommitTracker.java
(revision 1527836)
+++ solr/core/src/java/org/apache/solr/update/CommitTracker.java
(working copy)
@@ -53,8 +53,7 @@
   private int docsUpperBound;
   private long timeUpperBound;
   
-  private final ScheduledExecutorService scheduler = 
-  Executors.newScheduledThreadPool(1, new 
DefaultSolrThreadFactory("commitScheduler"));
+  private final ScheduledExecutorService scheduler;
   private ScheduledFuture pending;
   
   // state
@@ -72,6 +71,7 @@
   public CommitTracker(String name, SolrCore core, int docsUpperBound, int 
timeUpperBound, boolean openSearcher, boolean softCommit) {
 this.core = core;
 this.name = name;
+scheduler = Executors.newScheduledThreadPool(1, new 
DefaultSolrThreadFactory("commitScheduler-" + core.getName()));
 pending = null;
 
 this.docsUpperBound = docsUpperBound;
{noformat}

> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782490#comment-13782490
 ] 

Shawn Heisey commented on SOLR-5277:


I will dig around and see what I can do with thread names.

If you already know where to look because of your familiarity with the code, 
I'm perfectly willing to accept help!

> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5249) All Lucene/Solr modules should use the same dependency versions

2013-09-30 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe updated LUCENE-5249:
---

Attachment: LUCENE-5162.patch

Patch implementing the idea.

Introduces {{lucene/ivy-versions.properties}}, included in 
{{lucene/ivy-settings.xml}}, where all dependency versions are stored as 
properties of the form: {{/org/name = rev}}, e.g. {{/commons-io/commons-io = 
2.1}}.  There are two shared revs: {{jetty.version}} and {{hadoop.version}}, 
which are included in and interpolated as revs in 
{{lucene/ivy-versions.properties}}.

I thought about using Maven coordinate-style syntax, with a colon between the 
dependency's org and its name, but colons have to be escaped in property file 
syntax, since the colon a metachar equivalent to '=', so it looked clunky.  The 
path-ish slash style works everywhere I tried it, including as cmdline 
sysprops.  And it provides a sort of namespace for these properties.

I also switched to loading {{lucene/ivy-versions.properties}} in the 
{{-check-forbidden-java-apis}} target in {{solr/build.xml}}, to access the 
{{commons-io:commons-io}} version, used in locating the appropriate definitions 
file.

This patch also effectively upgrades the httpcomponents dependencies in the 
{{lucene/replicator}} module to the versions used in Solr; the required 
checksums are swapped in under {{lucene/licenses/}}.

{{ant precommit}} and {{ant test}} both pass after I {{rm $(find . -name 
'*.jar')}}.

I'll commit in a day or so if there are no objections.

> All Lucene/Solr modules should use the same dependency versions
> ---
>
> Key: LUCENE-5249
> URL: https://issues.apache.org/jira/browse/LUCENE-5249
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: general/build
>Reporter: Steve Rowe
>Assignee: Steve Rowe
>Priority: Minor
> Attachments: LUCENE-5162.patch
>
>
> [~markrmil...@gmail.com] wrote on the dev list:
> {quote}
> I'd like it for some things if we actually kept the versions somewhere else - 
> for instance, Hadoop dependencies should match across the mr module and the 
> core module.
> Perhaps we could define versions for dependencies across multiple modules 
> that should probably match, in a prop file or ant file and use sys sub for 
> them in the ivy files.
> For something like Hadoop, that would also make it simple to use Hadoop 1 
> rather than 2 with a single sys prop override. Same with some other 
> depenencies.
> {quote}



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782485#comment-13782485
 ] 

Robert Muir commented on SOLR-5277:
---

I think for the query case you get a jetty thread name.. i mean im not really 
sure who names it, but i just suspect that (e.g. qtp1640764503-13617). But the 
update cases seem like something is created by solr (e.g. 
commitScheduler-7-thread-1). 

If either or both of these when created, could include the name of the core, 
then the logging classes wouldn't have to concern themselves with adding the 
core's name and it would just be in one place.


> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782464#comment-13782464
 ] 

Shawn Heisey commented on SOLR-5277:


I'm pretty much a beginner at this.  I had a few programming classes a long 
time ago, and started teaching myself java about two years ago. :)  This is the 
only idea I have right now about how to do it.

Better thread names sounds pretty good to me!  I'm willing to give it a try if 
you can point me in the right direction.

It looks like the thread name is included in the example log4j.properties, but 
I don't think I included it in mine that I built myself.  If that's how we want 
to tackle this, the logging docs should probably mention that a lot of useful 
information can only be obtained from the thread names.


> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5249) All Lucene/Solr modules should use the same dependency versions

2013-09-30 Thread Steve Rowe (JIRA)

Steve Rowe created LUCENE-5249:
--

 Summary: All Lucene/Solr modules should use the same dependency 
versions
 Key: LUCENE-5249
 URL: https://issues.apache.org/jira/browse/LUCENE-5249
 Project: Lucene - Core
  Issue Type: Improvement
  Components: general/build
Reporter: Steve Rowe
Assignee: Steve Rowe
Priority: Minor


[~markrmil...@gmail.com] wrote on the dev list:

{quote}
I'd like it for some things if we actually kept the versions somewhere else - 
for instance, Hadoop dependencies should match across the mr module and the 
core module.

Perhaps we could define versions for dependencies across multiple modules that 
should probably match, in a prop file or ant file and use sys sub for them in 
the ivy files.

For something like Hadoop, that would also make it simple to use Hadoop 1 
rather than 2 with a single sys prop override. Same with some other depenencies.
{quote}




--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782421#comment-13782421
 ] 

Robert Muir commented on SOLR-5277:
---

Is it really necessary to do it this way, or can the threads (which are already 
printed) simply be named better?

> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5277) Stamp core names on log entries for certain classes


 [ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5277:
---

Attachment: SOLR-5277.patch

Patch adding [core] to logging in some places where it is easily obtained.

> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
> Attachments: SOLR-5277.patch
>
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5277) Stamp core names on log entries for certain classes


[ 
https://issues.apache.org/jira/browse/SOLR-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782370#comment-13782370
 ] 

Shawn Heisey commented on SOLR-5277:


{quote}
06:57:53.577 [qtp1640764503-13617] INFO org.apache.solr.core.SolrCore - 
SolrDeletionPolicy.onCommit: commits:num=2
{quote}

This one is in SolrDeletionPolicy.onCommit.  The logger for this class uses 
SolrCore.class ... is this correct, or should the logging correctly refer to 
the actual class?  I can't see a way to get the core name for this particular 
log entry.  Because of the way that onCommit gets called, I don't even know 
where the call is happening.

{quote}
11:53:19.056 [coreLoadExecutor-3-thread-1] INFO org.apache.solr.core.SolrCore - 
Soft AutoCommit: if uncommited for 1000ms;
{quote}

This is in CommitTracker.java.  Similar to the previous one, I can't see how I 
could get the core name.

{quote}
14:45:24.447 [commitScheduler-9-thread-1] INFO 
org.apache.solr.update.UpdateHandler - start commit
{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}

06:57:53.591 [qtp1640764503-13617] INFO org.apache.solr.update.UpdateHandler - 
end_commit_flush

14:45:24.553 [commitScheduler-7-thread-1] INFO 
org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
{quote}

I found these ones.  Patch coming soon.


> Stamp core names on log entries for certain classes
> ---
>
> Key: SOLR-5277
> URL: https://issues.apache.org/jira/browse/SOLR-5277
> Project: Solr
>  Issue Type: Bug
>  Components: search, update
>Affects Versions: 4.3.1, 4.4, 4.5
>Reporter: Dmitry Kan
>
> It is handy that certain Java classes stamp a [coreName] on a log entry. It 
> would be useful for multicore setup if more classes would stamp this 
> information.
> In particular we came accross a situaion with commits coming in a quick 
> succession to the same multicore shard and found it to be hard time figuring 
> out was it the same core or different cores.
> The classes in question with log sample output:
> o.a.s.c.SolrCore
> 06:57:53.577 [qtp1640764503-13617] INFO  org.apache.solr.core.SolrCore - 
> SolrDeletionPolicy.onCommit: commits:num=2
> 11:53:19.056 [coreLoadExecutor-3-thread-1] INFO  
> org.apache.solr.core.SolrCore - Soft AutoCommit: if uncommited for 1000ms;
> o.a.s.u.UpdateHandler
> 14:45:24.447 [commitScheduler-9-thread-1] INFO  
> org.apache.solr.update.UpdateHandler - start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=true,prepareCommit=false}
> 06:57:53.591 [qtp1640764503-13617] INFO  org.apache.solr.update.UpdateHandler 
> - end_commit_flush
> o.a.s.s.SolrIndexSearcher
> 14:45:24.553 [commitScheduler-7-thread-1] INFO  
> org.apache.solr.search.SolrIndexSearcher - Opening Searcher@1067e5a9 main
> The original question was posted on #solr and on SO:
> http://stackoverflow.com/questions/19026577/how-to-output-solr-core-name-with-log4j



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782355#comment-13782355
 ] 

Jan Høydahl commented on SOLR-5287:
---

Yea, +1 to supporting the short-name in schema. It's more readable after all.

I played with a sample self-documentation format which could be used by GUI to 
pull plugin documentation from server instead of hardcoding in HTML. Could also 
use for generating this part of the documentation for the ref-guide and avoid 
mismatch. And it would benefit Lucene and ES users as well as Solr users!

Example return string for {{synonymFilterFactory#getConfigSpec()}}
{code}
{ "key" : "synonym",
  "name" : "Synonym filter",
  "desc" : "Token filter which expands synonyms from a provided dictionary",
  "see" : 
"http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilter.html";,
  "params" : [
  { "name" : "synonyms", "info" : "Name of synonym dictionary file", 
"value" : "STRING" },
  { "name" : "format", "info" : "Specify format of the dictionary. May be 
solr or snowball", "value" : [ "solr", "snowball" ]},
  { "name" : "ignoreCase", "info" : "Set to true for case insensitive", 
"value" : "BOOLEAN" },
  { "name" : "expand", "info" : "If true, a synonym will be expanded to all 
equivalent synonyms. If false, all equivalent synonyms will be reduced to the 
first in the list", "value" : "BOOLEAN" },
  { "name" : "tokenizerFactory", "info" : "Which tokenizer to use when 
parsing the dictionary. Use either shortname or class name", "value" : 
"ref:TOKENIZERS"}
   ]
}
{code}

Well, guess we're way off-track here compared to original issue. Let's spin off 
separate JIRAs for whatever we'd like to achieve :)

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: No longer allowed to store html in a 'string' type

2013-09-30 Thread Kevin Cunningham

Wooops, wrong alias.  Posted to user instead.

From: Kevin Cunningham [mailto:kcunning...@telligent.com]
Sent: Monday, September 30, 2013 5:07 PM
To: dev@lucene.apache.org
Subject: No longer allowed to store html in a 'string' type

We have been using Solr for a while now, went from 1.4 -> 3.6.  While running 
some tests in 4.4 we are no longer allowed to store raw html in a documents 
field with a type of 'string', which we used to be able to do. Has something 
changed here?  Now we get the following error: Undeclared general entity 
\"nbsp\"\r\n at [row,col {unknown-source}]: [11,53]

I understand what its saying and can change the way we store and extract it if 
it's a must but would like to understand what changed.  Sounds like something 
just became more strict to adhering to rules.

Testing #bananas tag  
document document document document document document

blog

RE: No longer allowed to store html in a 'string' type

2013-09-30 Thread Uwe Schindler

You have to correctly escape your xml-like HTML inside the XML you send to SOLR 
(using  or via escaping with < > "). Otherwise Solr 
would be attackable using HTML-injection.

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

  http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Kevin Cunningham [mailto:kcunning...@telligent.com] 
Sent: Tuesday, October 01, 2013 12:07 AM
To: dev@lucene.apache.org
Subject: No longer allowed to store html in a 'string' type

 

We have been using Solr for a while now, went from 1.4 -> 3.6.  While running 
some tests in 4.4 we are no longer allowed to store raw html in a documents 
field with a type of ‘string’, which we used to be able to do. Has something 
changed here?  Now we get the following error: Undeclared general entity 
\"nbsp\"\r\n at [row,col {unknown-source}]: [11,53]


I understand what its saying and can change the way we store and extract it if 
it’s a must but would like to understand what changed.  Sounds like something 
just became more strict to adhering to rules.

 





Testing #bananas tag  
document document document document document document



blog

No longer allowed to store html in a 'string' type

2013-09-30 Thread Kevin Cunningham

We have been using Solr for a while now, went from 1.4 -> 3.6.  While running 
some tests in 4.4 we are no longer allowed to store raw html in a documents 
field with a type of 'string', which we used to be able to do. Has something 
changed here?  Now we get the following error: Undeclared general entity 
\"nbsp\"\r\n at [row,col {unknown-source}]: [11,53]

I understand what its saying and can change the way we store and extract it if 
it's a must but would like to understand what changed.  Sounds like something 
just became more strict to adhering to rules.



Testing #bananas tag  
document document document document document document

blog

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782320#comment-13782320
 ] 

Uwe Schindler commented on SOLR-5287:
-

See: 
[LUCENE-4044|https://issues.apache.org/jira/browse/LUCENE-4044?focusedCommentId=13421103&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13421103]

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782308#comment-13782308
 ] 

Uwe Schindler commented on SOLR-5287:
-

Jan: The main problem is the crazy transformation Solr does with these names 
for backwards compatibility. SolrResourceLoader detects factories with regexps 
and converts it to the "simple" names taken by the factory. Unfortunately, 
currently Solr does not allow to specify the "canonical name" of the analyzer.

In general we should remove class="solr.FooBarFactory" support from the 
analyzer schema and rename this to e.g., name="whitespace" without *Factory 
that gets directly passed to the SPI. For backwards compatibility in 4.x we can 
still resolve "solr.FooBarFactory", but in 5.0 only real existing class names 
should work with 5.0 (is used with class).

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782308#comment-13782308
 ] 

Uwe Schindler edited comment on SOLR-5287 at 9/30/13 9:57 PM:
--

Jan: The main problem is the crazy transformation Solr does with these names 
for backwards compatibility. SolrResourceLoader detects factories with regexps 
and converts it to the "simple" names taken by the SPI. Unfortunately, 
currently Solr does not allow to specify the "canonical name" of the analyzer.

In general we should remove class="solr.FooBarFactory" support from the 
analyzer schema and rename this to e.g., name="whitespace" without *Factory 
that gets directly passed to the SPI. For backwards compatibility in 4.x we can 
still resolve "solr.FooBarFactory", but in 5.0 only real existing class names 
should work with 5.0 (is used with class).


was (Author: thetaphi):
Jan: The main problem is the crazy transformation Solr does with these names 
for backwards compatibility. SolrResourceLoader detects factories with regexps 
and converts it to the "simple" names taken by the factory. Unfortunately, 
currently Solr does not allow to specify the "canonical name" of the analyzer.

In general we should remove class="solr.FooBarFactory" support from the 
analyzer schema and rename this to e.g., name="whitespace" without *Factory 
that gets directly passed to the SPI. For backwards compatibility in 4.x we can 
still resolve "solr.FooBarFactory", but in 5.0 only real existing class names 
should work with 5.0 (is used with class).

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782308#comment-13782308
 ] 

Uwe Schindler edited comment on SOLR-5287 at 9/30/13 9:58 PM:
--

Jan: The main problem is the crazy transformation Solr does with these names 
for backwards compatibility. SolrResourceLoader detects factories with regexps 
and converts it to the "simple" names taken by the SPI. Unfortunately, 
currently Solr does not allow to specify the "canonical name" of the analyzer.

In general we should remove class="solr.FooBarFactory" support from the 
analyzer schema and rename this to e.g., name="whitespace" without *Factory 
that gets directly passed to the SPI. For backwards compatibility in 4.x we can 
still resolve "solr.FooBarFactory", but in 5.0 only real existing class names 
should work (if used with class). For "official" factories, only use name="" 
(which also reads much better).


was (Author: thetaphi):
Jan: The main problem is the crazy transformation Solr does with these names 
for backwards compatibility. SolrResourceLoader detects factories with regexps 
and converts it to the "simple" names taken by the SPI. Unfortunately, 
currently Solr does not allow to specify the "canonical name" of the analyzer.

In general we should remove class="solr.FooBarFactory" support from the 
analyzer schema and rename this to e.g., name="whitespace" without *Factory 
that gets directly passed to the SPI. For backwards compatibility in 4.x we can 
still resolve "solr.FooBarFactory", but in 5.0 only real existing class names 
should work with 5.0 (is used with class).

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782300#comment-13782300
 ] 

Jan Høydahl commented on SOLR-5287:
---

It would be super cool to expose these via some API too so people could make 
3rd party schema builders without hardcoding filter names in the tool itself.

And the next step would be to allow Analysis Factories to implement something 
like {{getConfigSpec()}} to document available/allowed configuration options, 
much like a DTD or schema for allowed params and values. But this is done via 
NamedLists now so we don't get anything for free through introspection and 
would probably need to invent some internal JSON syntax for this.

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-MacOSX (64bit/jdk1.6.0) - Build # 848 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-MacOSX/848/
Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 34012 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2480 (and 1369 related) class file(s) for forbidden 
API invocations (in 10.41s), 1 error(s).

BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:427: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/build.xml:67: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-4.x-MacOSX/solr/build.xml:285: Check for 
forbidden API calls failed, see log.

Total time: 114 minutes 25 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.6.0 -XX:+UseCompressedOops -XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5264) New method on NamedList to return one or many config arguments as collection


 [ 
https://issues.apache.org/jira/browse/SOLR-5264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shawn Heisey updated SOLR-5264:
---

Attachment: SOLR-5264.patch

New patch.  Modifies javadoc as recommended.  Renames removeArgs to 
removeConfigArgs.  Does not seem to introduce any test failures.

> New method on NamedList to return one or many config arguments as collection
> 
>
> Key: SOLR-5264
> URL: https://issues.apache.org/jira/browse/SOLR-5264
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 4.5
>Reporter: Shawn Heisey
>Assignee: Shawn Heisey
>Priority: Minor
> Fix For: 5.0, 4.6
>
> Attachments: SOLR-5264.patch, SOLR-5264.patch, SOLR-5264.patch, 
> SOLR-5264.patch, SOLR-5264.patch
>
>
> In the FieldMutatingUpdateProcessorFactory is a method called "oneOrMany" 
> that takes all of the entries in a NamedList and pulls them out into a 
> Collection.  I'd like to use that in a custom update processor I'm building.
> It seems as though this functionality would be right at home as part of 
> NamedList itself.  Here's a patch that moves the method.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782272#comment-13782272
 ] 

Uwe Schindler commented on SOLR-5287:
-

Yes, works, you can list all factories registered by SPI:
- 
[http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/util/TokenizerFactory.html#availableTokenizers()]
- 
[http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/util/TokenFilterFactory.html#availableTokenFilters()]
- 
[http://lucene.apache.org/core/4_4_0/analyzers-common/org/apache/lucene/analysis/util/CharFilterFactory.html#availableCharFilters()]

FYI: The same works with Codecs,...

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-09-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782255#comment-13782255
 ] 

Jan Høydahl commented on SOLR-5287:
---

This could be done in steps for sure. First add ability to POST an entire 
schema through the REST API and implement that in Admin GUI, much like your 
original plan. Then implement support for the rest of todays Schema API 
(https://cwiki.apache.org/confluence/display/solr/Schema+API). Finally, extend 
the API to delete and modify stuff.

By SPI I mean Service Provider Interface, where each tokenfilter etc registers 
itself so you can discover them, e.g. you could refer to the Synonym filter by 
"synonym" instead of "solr.SynonymFilterFactory" since it's registered as a 
service. Believe it should be possible to list all registered components, but I 
have not tried.

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3530) better error messages / Content-Type validation in solrJ


[ 
https://issues.apache.org/jira/browse/SOLR-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782245#comment-13782245
 ] 

ASF subversion and git services commented on SOLR-3530:
---

Commit 1527786 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1527786 ]

SOLR-3530: Add missing 'throw'.

> better error messages / Content-Type validation in solrJ
> 
>
> Key: SOLR-3530
> URL: https://issues.apache.org/jira/browse/SOLR-3530
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
>
> spin off from SOLR-3258, it would be helpful if SolrJ, when encountering 
> Exceptions from the ResponseParser (or perhaps before ever even handing data 
> to the ResponseParser) did some validation of the Content-Type returned by 
> the remote server to give better error messages in cases where miss 
> configuration has the wrong matchup between ResponseParser and mime-type (or 
> worse: an HTML page being returned by a non-solr server)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 2041 - Still Failing

2013-09-30 Thread ASF subversion and git services (JIRA)

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/2041/

All tests passed

Build Log:
[...truncated 25923 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2480 (and 1369 related) class file(s) for forbidden 
API invocations (in 2.13s), 1 error(s).

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:427:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:67:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/solr/build.xml:285:
 Check for forbidden API calls failed, see log.

Total time: 65 minutes 1 second
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-09-30 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782242#comment-13782242
 ] 

Stefan Matheis (steffkes) commented on SOLR-5287:
-

the simple {{textarea}} was only meant to be a starting point, calling a REST 
API is fine as well - would need to now what options are available on that, but 
in general that's possible as well, for sure.

offering some kind of a wizard is always a bit tricky since you have to offer 
really _all_ the possible options, otherwise some people can't use it and/or 
you have to provide that ugly "others" option, where (if one uses it) the 
complete drag & drop idea goes out of the door :/

simple baseline: i'm fine either way - let's do what we can & what make sense 
with the features in mind (:

[~janhoy] that "SPI" you mentioned, not sure what the abbreviation stands for?

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3530) better error messages / Content-Type validation in solrJ


[ 
https://issues.apache.org/jira/browse/SOLR-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782243#comment-13782243
 ] 

ASF subversion and git services commented on SOLR-3530:
---

Commit 1527784 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1527784 ]

SOLR-3530: Add missing 'throw'.

> better error messages / Content-Type validation in solrJ
> 
>
> Key: SOLR-3530
> URL: https://issues.apache.org/jira/browse/SOLR-3530
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
>
> spin off from SOLR-3258, it would be helpful if SolrJ, when encountering 
> Exceptions from the ResponseParser (or perhaps before ever even handing data 
> to the ResponseParser) did some validation of the Content-Type returned by 
> the remote server to give better error messages in cases where miss 
> configuration has the wrong matchup between ResponseParser and mime-type (or 
> worse: an HTML page being returned by a non-solr server)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5214) Add new FreeTextSuggester, to handle "long tail" suggestions

2013-09-30 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5214:
---

Attachment: LUCENE-5214.patch

New patch, resolving all nocommits.  I think it's ready!

> Add new FreeTextSuggester, to handle "long tail" suggestions
> 
>
> Key: LUCENE-5214
> URL: https://issues.apache.org/jira/browse/LUCENE-5214
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/spellchecker
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 5.0, 4.6
>
> Attachments: LUCENE-5214.patch, LUCENE-5214.patch
>
>
> The current suggesters are all based on a finite space of possible
> suggestions, i.e. the ones they were built on, so they can only
> suggest a full suggestion from that space.
> This means if the current query goes outside of that space then no
> suggestions will be found.
> The goal of FreeTextSuggester is to address this, by giving
> predictions based on an ngram language model, i.e. using the last few
> tokens from the user's query to predict likely following token.
> I got the idea from this blog post about Google's suggest:
> http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html
> This is very much still a work in progress, but it seems to be
> working.  I've tested it on the AOL query logs, using an interactive
> tool from luceneutil to show the suggestions, and it seems to work well.
> It's fun to use that tool to explore the word associations...
> I don't think this suggester would be used standalone; rather, I think
> it'd be a fallback for times when the primary suggester fails to find
> anything.  You can see this behavior on google.com, if you type "the
> fast and the ", you see entire queries being suggested, but then if
> the next word you type is "burning" then suddenly you see the
> suggestions are only based on the last word, not the entire query.
> It uses ShingleFilter under-the-hood to generate the token ngrams;
> once LUCENE-5180 is in it will be able to properly handle a user query
> that ends with stop-words (e.g. "wizard of "), and then stores the
> ngrams in an FST.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3530) better error messages / Content-Type validation in solrJ

2013-09-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782231#comment-13782231
 ] 

ASF subversion and git services commented on SOLR-3530:
---

Commit 1527780 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1527780 ]

SOLR-3530: Handle content type correctly when a response parser cannot be used.

> better error messages / Content-Type validation in solrJ
> 
>
> Key: SOLR-3530
> URL: https://issues.apache.org/jira/browse/SOLR-3530
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
>
> spin off from SOLR-3258, it would be helpful if SolrJ, when encountering 
> Exceptions from the ResponseParser (or perhaps before ever even handing data 
> to the ResponseParser) did some validation of the Content-Type returned by 
> the remote server to give better error messages in cases where miss 
> configuration has the wrong matchup between ResponseParser and mime-type (or 
> worse: an HTML page being returned by a non-solr server)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3530) better error messages / Content-Type validation in solrJ

2013-09-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782228#comment-13782228
 ] 

ASF subversion and git services commented on SOLR-3530:
---

Commit 1527776 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1527776 ]

SOLR-3530: Handle content type correctly when a response parser cannot be used.

> better error messages / Content-Type validation in solrJ
> 
>
> Key: SOLR-3530
> URL: https://issues.apache.org/jira/browse/SOLR-3530
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
>
> spin off from SOLR-3258, it would be helpful if SolrJ, when encountering 
> Exceptions from the ResponseParser (or perhaps before ever even handing data 
> to the ResponseParser) did some validation of the Content-Type returned by 
> the remote server to give better error messages in cases where miss 
> configuration has the wrong matchup between ResponseParser and mime-type (or 
> worse: an HTML page being returned by a non-solr server)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4787) Join Contrib

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-4787:
-

Description:
This contrib provides a place where different join implementations can be
contributed to Solr. This contrib currently includes 3 join implementations.
The initial patch was generated from the Solr 4.3 tag. Because of changes in
the FieldCache API this patch will only build with Solr 4.2 or above.

*HashSetJoinQParserPlugin aka hjoin*

The hjoin provides a join implementation that filters results in one core based
on the results of a search in another core. This is similar in functionality to
the JoinQParserPlugin but the implementation differs in a couple of important
ways.

The first way is that the hjoin is designed to work with int and long join keys
only. So, in order to use hjoin, int or long join keys must be included in both
the to and from core.

The second difference is that the hjoin builds memory structures that are used
to quickly connect the join keys. So, the hjoin will need more memory then the
JoinQParserPlugin to perform the join.

The main advantage of the hjoin is that it can scale to join millions of keys
between cores and provide sub-second response time. The hjoin should work well
with up to two million results from the fromIndex and tens of millions of
results from the main query.

The hjoin supports the following features:

1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will turn
on the PostFilter. The PostFilter will typically outperform the Lucene query
when the main query results have been narrowed down.

2) With the lucene query implementation there is an option to build the filter
with threads. This can greatly improve the performance of the query if the main
query index is very large. The "threads" parameter turns on threading. For
example *threads=6* will use 6 threads to build the filter. This will setup a
fixed threadpool with six threads to handle all hjoin requests. Once the
threadpool is created the hjoin will always use it to build the filter.
Threading does not come into play with the PostFilter.

3) The *size* local parameter can be used to set the initial size of the
hashset used to perform the join. If this is set above the number of results
from the fromIndex then the you can avoid hashset resizing which improves
performance.

4) Nested filter queries. The local parameter "fq" can be used to nest a filter
query within the join. The nested fq will filter the results of the join query.
This can point to another join to support nested joins.

5) Full caching support for the lucene query implementation. The filterCache
and queryResultCache should work properly even with deep nesting of joins. Only
the queryResultCache comes into play with the PostFilter implementation because
PostFilters are not cacheable in the filterCache.

The syntax of the hjoin is similar to the JoinQParserPlugin except that the
plugin is referenced by the string "hjoin" rather then "join".

fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
fq=$qq\}user:customer1&qq=group:5

The example filter query above will search the fromIndex (collection2) for
"user:customer1" applying the local fq parameter to filter the results. The
lucene filter query will be built using 6 threads. This query will generate a
list of values from the "from" field that will be used to filter the main
query. Only records from the main query, where the "to" field is present in the
"from" list will be included in the results.

The solrconfig.xml in the main query core must contain the reference to the
hjoin.

And the join contrib lib jars must be registed in the solrconfig.xml.

After issuing the "ant dist" command from inside the solr directory the joins
contrib jar will appear in the solr/dist directory. Place the the
solr-joins-4.*-.jar in the WEB-INF/lib directory of the solr webapplication.
This will ensure that the top level Solr classloader loads these classes rather
then the core's classloaded.

*BitSetJoinQParserPlugin aka bjoin*

The bjoin behaves exactly like the hjoin but uses a BitSet instead of a HashSet
to perform the underlying join. Because of this the bjoin is much faster and
can provide sub-second response times on result sets of tens of millions of
records from the fromIndex and hundreds of millions of records from the main
query.

But there are limitations to how the bjoin can be used. The bjoin treats the
join keys as addresses in a BitSet and uses the Lucene OpenBitSet
implementation which performs very well but is not sparse. So the BitSet memory
is dictated by the size of the join keys. For example a bitset with a max join
key of 200,000,000 will need 25 MB of memory. For this reason the BitSet join
does not support long join keys. In order to keep memory usage down the join
keys should also be packed a

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.6.0_45) - Build # 7595 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/7595/
Java: 64bit/jdk1.6.0_45 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 25879 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2480 (and 1369 related) class file(s) for forbidden 
API invocations (in 1.84s), 1 error(s).

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:427: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:67: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build.xml:285: Check for 
forbidden API calls failed, see log.

Total time: 50 minutes 36 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.6.0_45 -XX:+UseCompressedOops 
-XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782227#comment-13782227
 ] 

Shai Erera commented on LUCENE-5248:

I discussed that w/ Mike on chat and here's the plan we came to:

* Don't buffer updates in RALD anymore. It's silly, since as Mike wrote above, 
one of the reasons we applyDeletes is because IW's RAM buffer limit was 
reached. By buffering updates, we only move the RAM elsewhere, where it's not 
accounted for (RALD).
* Instead, BufferedDeleteStream will build the Map 
structure as described above and hand them to RALD.writeFieldUpdates
* RALD.writeFieldUpdates will execute the portion of the code that is currently 
executed in writeLiveDocs.
** If the segment isn't merging ({{isMerging=false}}), the map is discarded and 
can be GC'd.
** Otherwise, it will need to buffer the resolved updates, so they can later be 
applied to the merged segment (a note on that below).
** That's not bad though, as this is done only temporarily, until the segment 
finishes merging, or merge is aborted/failed, then it's cleared away.

The reason why we need to buffer the resolved updates in the {{isMerging}} case 
is because the raw form keeps a docIDUpto, which after merging may make no 
sense. For example, if you have two segments to which an update is applied: for 
_0, docIDUpto=MAX_VAL (i.e. it's an already existing segment) and for _1 it's 
17 (i.e. it's a newly flushed segment where updates should be applied up to doc 
17), and if you use SortingMP .. docIDUpto=17 and MAX_VAL become irrelevant. 
The docs can be entirely shuffled and then you don't know which docs should 
receive the updates anymore. And if you have SortingMP and deletes, it only 
becomes more complicated.

I think that for now we should buffer the resolved updates, improve the data 
structure used to buffer them, and handle that later.

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4787) Join Contrib

[
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Joel Bernstein updated SOLR-4787:
-

*HashSetJoinQParserPlugin aka hjoin*

The first way is that the hjoin is designed to work with int and long join keys
only. So, in order to use hjoin, int or long join keys must be included in both
the to and from core.

The second difference is that the hjoin builds memory structures that are used
to quickly connect the join keys. So, the hjoin will need more memory then the
JoinQParserPlugin to perform the join.

The hjoin supports the following features:

The syntax of the hjoin is similar to the JoinQParserPlugin except that the
plugin is referenced by the string "hjoin" rather then "join".

fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6
fq=$qq\}user:customer1&qq=group:5

The solrconfig.xml in the main query core must contain the reference to the
pjoin.

And the join contrib lib jars must be registed in the solrconfig.xml.

*BitSetJoinQParserPlugin aka bjoin*

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-09-30 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782223#comment-13782223
 ] 

Erick Erickson commented on SOLR-5287:
--

[~janhoy] Hmmm, I hadn't thought much about that. I know the REST API isn't 
complete yet, I think I saw a JIRA float by that you couldn't, for instance, 
update a field.

Hmmm, that would be an interesting way to pretty thoroughly make the REST API 
robust. Also, there wouldn't be any special code on the server, anything that 
had to e done to the server would be done in enhancing the REST API.

I had in mind a really quick hack here, but it really seems like the right 
thing to do is do it the right way. Siiih. 

[~sar...@syr.edu] [~grant_ingers...@yahoo.com] What do you think? I took a 
quick look at the copyfield patch and it doesn't look like a huge effort to 
build on what's been done before for each new bit we want to support, or am I 
dreaming?

[~steffkes] This makes the UI part a bit different, instead of POSTing, you'd 
use one of the (perhaps new) REST API calls. Any comments one way or the other?



> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: NumericRangeTermsEnum

2013-09-30 Thread Uwe Schindler

Hi,

As Mike says, it is not wanted to call getTermsEnum directly (not on the level 
of query). The workaround by subclassing does not work because 
NumericRangeQuery has no constructor, only static factories for the data types.

One correct way to handle all this is (and works with all MultiTermQueries 
around, so also TermRangeQuery, FuzzyQuery, WildcardQuery,...): Implement your 
own MultiTermQuery.RewriteMethod. >From this class you implement the way how to 
rewrite the query to a query with all expanded terms. The trick here is to 
return to some fake query that contains all the terms. This is done quite often 
in some special query types that do fancy rewrites: e.g., one that may 
construct a phrase query containing wildcards. The code is not yet released 
open source, but works exactly that way - I will open an issue soon, here some 
code parts:

  /** A fake query that is just used to collect all term instances for the 
{@link ScoringRewrite} API. */
  final class TermHolderQuery extends Query {
private final ArrayList terms = new ArrayList();

@Override
public String toString(String defaultField) {
  return getClass().getSimpleName() + terms;
}

void add(Term term) {
  terms.add(term);
}

Term[] getTerms() {
  return terms.toArray(new Term[terms.size()]);
}
  }

  final class MultiPhraseMTQRewrite extends ScoringRewrite {
@Override
protected void addClause(TermHolderQuery topLevel, Term term, float boost) {
  topLevel.add(term);
}

@Override
protected TermHolderQuery getTopLevelQuery() {
  return new TermHolderQuery();
}
  }

But I still think you have an XY problem: Please explain what you want to do 
originally, so we can help you to implement this in "the Lucene way", without 
crazy workaround. The above code is used for rewriting a query, so it is not 
misusing Lucene, but your use case seems strange to me.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -Original Message-
> From: Michael McCandless [mailto:luc...@mikemccandless.com]
> Sent: Monday, September 30, 2013 10:36 PM
> To: chetvora
> Cc: Lucene/Solr dev
> Subject: Re: NumericRangeTermsEnum
> 
> Well, it was protected just because we didn't think apps needed to call it
> directly.
> 
> You could workaround it ... subclass it and add your own public method that
> delegates to .getTermsEnum.  Or access it via reflection.
> 
> Alternatively, just call Query.rewrite() and the returned Query will reflect 
> the
> terms that the original query had expanded to (though, it may rewrite to
> MultiTermQueryWrapperFilter, which won't get you the terms ...).
> 
> But, can you describe more how you plan to create performant filters from
> this method?
> 
> Mike McCandless
> 
> http://blog.mikemccandless.com
> 
> 
> On Mon, Sep 30, 2013 at 1:18 PM, Chet Vora  wrote:
> > Mike
> >
> > We want to use the lower level Terms API to create some custom high
> > performant filters ... is there any reason why the method
> > NumericRangeQuery.getTermsEnum() was made protected in the API as
> > opposed to public?
> >
> > CV
> >
> >
> > On Fri, Sep 27, 2013 at 4:15 PM, Michael McCandless
> >  wrote:
> >>
> >> Normally you'd create a NumericRangeFilter/Query and just use that?
> >>
> >> Under the hood, Lucene uses that protected API to visit all matching
> >> terms...
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Thu, Sep 26, 2013 at 9:59 AM, Chet Vora 
> wrote:
> >> > Hi all
> >> >
> >> > I was trying to use the above enum to do some range search on dates...
> >> > this
> >> > enum is returned by NumericRangeQuery.getTermsEnum() but I
> realized
> >> > that this is a protected method of the class and since this is a
> >> > final class, I can't see how I can use it. Maybe I'm missing
> >> > something ?
> >> >
> >> > Would appreciate any pointers.
> >> >
> >> > Thanks
> >> >
> >> > CV
> >
> >
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
> commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS-MAVEN] Lucene-Solr-Maven-trunk #984: POMs out of sync

2013-09-30 Thread Steve Rowe

On Sep 30, 2013, at 8:06 AM, Mark Miller  wrote:
> On Sep 30, 2013, at 12:52 AM, Steve Rowe  wrote:
> 
>> I think we should have a test somewhere that complains when we have multiple 
>> versions of the same dependency, across all Lucene and Solr modules.
> 
> I think that is a fine idea.
> 
> I'd like it for some things if we actually kept the versions somewhere else - 
> for instance, Hadoop dependencies should match across the mr module and the 
> core module.
> 
> Perhaps we could define versions for dependencies across multiple modules 
> that should probably match, in a prop file or ant file and use sys sub for 
> them in the ivy files.
> 
> For something like Hadoop, that would also make it simple to use Hadoop 1 
> rather than 2 with a single sys prop override. Same with some other 
> depenencies.

I like the idea - this would also make it unnecessary to test for multiple 
versions of the same dependency.  I'll whip up a patch.

Steve


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_40) - Build # 7687 - Still Failing!

2013-09-30 Thread Mark Miller

I'm committing a change.

- Mark

On Sep 30, 2013, at 4:35 PM, Steve Rowe  wrote:

> HttpSolrServer.java, line 417, uses forbidden api 
> o.a.commons.IOUtils.toString(InputStream):
> 
> 
> 
> On Sep 30, 2013, at 4:01 PM, Policeman Jenkins Server  
> wrote:
> 
>> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7687/
>> Java: 64bit/jdk1.7.0_40 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
>> 
>> All tests passed
>> 
>> Build Log:
>> [...truncated 26231 lines...]
>> -check-forbidden-java-apis:
>> [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
>> [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
>> [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
>> [forbidden-apis] Reading API signatures: 
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/base.txt
>> [forbidden-apis] Reading API signatures: 
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/servlet-api.txt
>> [forbidden-apis] Loading classes to check...
>> [forbidden-apis] Scanning for API signatures and dependencies...
>> [forbidden-apis] WARNING: The referenced class 
>> 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
>> Please fix the classpath!
>> [forbidden-apis] Forbidden method invocation: 
>> org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
>> charset]
>> [forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
>> (HttpSolrServer.java:415)
>> [forbidden-apis] WARNING: The referenced class 
>> 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
>> Please fix the classpath!
>> [forbidden-apis] WARNING: The referenced class 
>> 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
>> Please fix the classpath!
>> [forbidden-apis] WARNING: The referenced class 
>> 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
>> the classpath!
>> [forbidden-apis] Scanned 2477 (and 1385 related) class file(s) for forbidden 
>> API invocations (in 2.11s), 1 error(s).
>> 
>> BUILD FAILED
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:421: The 
>> following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:67: The 
>> following error occurred while executing this line:
>> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:285: Check 
>> for forbidden API calls failed, see log.
>> 
>> Total time: 46 minutes 27 seconds
>> Build step 'Invoke Ant' marked build as failure
>> Description set: Java: 64bit/jdk1.7.0_40 -XX:+UseCompressedOops 
>> -XX:+UseConcMarkSweepGC
>> Archiving artifacts
>> Recording test results
>> Email was triggered for: Failure
>> Sending email for trigger: Failure
>> 
>> 
>> 
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
>> For additional commands, e-mail: dev-h...@lucene.apache.org
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-4787) Join Contrib


 [ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein updated SOLR-4787:
-

Attachment: SOLR-4787.patch

This patch resolves a compile issue in the last patch and has the latest work 
that was done for the bjoin. The hjoin work that Kranti has worked on has not 
yet included.

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
>  
>  
> *BitSetJoinQParserPlugin aka bjoin*
> The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
> HashSet to

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782207#comment-13782207
 ] 

Jan Høydahl commented on SOLR-5287:
---

Bear in mind that hand-editing these on a production server may be undesirable 
in many companies, because they want to enforce strict versioning of configs 
and controlled official mechanisms for rolling out changes. Thus whatever we 
end up with, it should be possible to enable/disable this feature.

Further, I'd prefer that such GUI options (if enabled) work on top of Schema 
REST API and the planned Config API. This way the GUI can be made more 
intelligent than simply a big {{}}, but evolve into a very intutitive 
"Schema IDE" which makes it far easier to relate to things. Available 
analyzers, tokenizers, tokenfilters could be fetched from SPI and be 
drag&dropp'ed into a fieldType etc. Telling people to start using managed 
schema to gain access to such a feature may not be a bad thing at all, if 
that's the way we're taking the product in 5.x anyway.

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: NumericRangeTermsEnum

2013-09-30 Thread Michael McCandless

Well, it was protected just because we didn't think apps needed to
call it directly.

You could workaround it ... subclass it and add your own public method
that delegates to .getTermsEnum.  Or access it via reflection.

Alternatively, just call Query.rewrite() and the returned Query will
reflect the terms that the original query had expanded to (though, it
may rewrite to MultiTermQueryWrapperFilter, which won't get you the
terms ...).

But, can you describe more how you plan to create performant filters
from this method?

Mike McCandless

http://blog.mikemccandless.com


On Mon, Sep 30, 2013 at 1:18 PM, Chet Vora  wrote:
> Mike
>
> We want to use the lower level Terms API to create some custom high
> performant filters ... is there any reason why the method
> NumericRangeQuery.getTermsEnum() was made protected in the API as opposed to
> public?
>
> CV
>
>
> On Fri, Sep 27, 2013 at 4:15 PM, Michael McCandless
>  wrote:
>>
>> Normally you'd create a NumericRangeFilter/Query and just use that?
>>
>> Under the hood, Lucene uses that protected API to visit all matching
>> terms...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Sep 26, 2013 at 9:59 AM, Chet Vora  wrote:
>> > Hi all
>> >
>> > I was trying to use the above enum to do some range search on dates...
>> > this
>> > enum is returned by NumericRangeQuery.getTermsEnum() but I realized that
>> > this is a protected method of the class and since this is a final class,
>> > I
>> > can't see how I can use it. Maybe I'm missing something ?
>> >
>> > Would appreciate any pointers.
>> >
>> > Thanks
>> >
>> > CV
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_40) - Build # 7687 - Still Failing!

2013-09-30 Thread Steve Rowe

HttpSolrServer.java, line 417, uses forbidden api 
o.a.commons.IOUtils.toString(InputStream):



On Sep 30, 2013, at 4:01 PM, Policeman Jenkins Server  
wrote:

> Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7687/
> Java: 64bit/jdk1.7.0_40 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC
> 
> All tests passed
> 
> Build Log:
> [...truncated 26231 lines...]
> -check-forbidden-java-apis:
> [forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
> [forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
> [forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
> [forbidden-apis] Reading API signatures: 
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/base.txt
> [forbidden-apis] Reading API signatures: 
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/servlet-api.txt
> [forbidden-apis] Loading classes to check...
> [forbidden-apis] Scanning for API signatures and dependencies...
> [forbidden-apis] WARNING: The referenced class 
> 'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. 
> Please fix the classpath!
> [forbidden-apis] Forbidden method invocation: 
> org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
> charset]
> [forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
> (HttpSolrServer.java:415)
> [forbidden-apis] WARNING: The referenced class 
> 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
> Please fix the classpath!
> [forbidden-apis] WARNING: The referenced class 
> 'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. 
> Please fix the classpath!
> [forbidden-apis] WARNING: The referenced class 
> 'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
> the classpath!
> [forbidden-apis] Scanned 2477 (and 1385 related) class file(s) for forbidden 
> API invocations (in 2.11s), 1 error(s).
> 
> BUILD FAILED
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:421: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:67: The 
> following error occurred while executing this line:
> /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:285: Check 
> for forbidden API calls failed, see log.
> 
> Total time: 46 minutes 27 seconds
> Build step 'Invoke Ant' marked build as failure
> Description set: Java: 64bit/jdk1.7.0_40 -XX:+UseCompressedOops 
> -XX:+UseConcMarkSweepGC
> Archiving artifacts
> Recording test results
> Email was triggered for: Failure
> Sending email for trigger: Failure
> 
> 
> 
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4787) Join Contrib


[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782184#comment-13782184
 ] 

Joel Bernstein commented on SOLR-4787:
--

Yes, I noticed the latest patch is reffering to ByteArray which isn't present. 
I'm going to be putting up new patch shortly to resolve this. It will also 
include the latest work done on the BitSet join.

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
>  
>  
> *BitSetJoinQParserPlugin aka bjoin*
> The bjoin behaves exactly like the hjoin but

[jira] [Commented] (SOLR-4787) Join Contrib

2013-09-30 Thread Peter Keegan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782179#comment-13782179
 ] 

Peter Keegan commented on SOLR-4787:


I'm seeing ByteArray compilation problem, too. Where would I find this class?

> Join Contrib
> 
>
> Key: SOLR-4787
> URL: https://issues.apache.org/jira/browse/SOLR-4787
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 4.2.1
>Reporter: Joel Bernstein
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: SOLR-4787-deadlock-fix.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787.patch, SOLR-4787.patch, SOLR-4787.patch, 
> SOLR-4787-pjoin-long-keys.patch, SOLR-4797-hjoin-multivaluekeys-trunk.patch
>
>
> This contrib provides a place where different join implementations can be 
> contributed to Solr. This contrib currently includes 3 join implementations. 
> The initial patch was generated from the Solr 4.3 tag. Because of changes in 
> the FieldCache API this patch will only build with Solr 4.2 or above.
> *HashSetJoinQParserPlugin aka hjoin*
> The hjoin provides a join implementation that filters results in one core 
> based on the results of a search in another core. This is similar in 
> functionality to the JoinQParserPlugin but the implementation differs in a 
> couple of important ways.
> The first way is that the hjoin is designed to work with int and long join 
> keys only. So, in order to use hjoin, int or long join keys must be included 
> in both the to and from core.
> The second difference is that the hjoin builds memory structures that are 
> used to quickly connect the join keys. So, the hjoin will need more memory 
> then the JoinQParserPlugin to perform the join.
> The main advantage of the hjoin is that it can scale to join millions of keys 
> between cores and provide sub-second response time. The hjoin should work 
> well with up to two million results from the fromIndex and tens of millions 
> of results from the main query.
> The hjoin supports the following features:
> 1) Both lucene query and PostFilter implementations. A *"cost"* > 99 will 
> turn on the PostFilter. The PostFilter will typically outperform the Lucene 
> query when the main query results have been narrowed down.
> 2) With the lucene query implementation there is an option to build the 
> filter with threads. This can greatly improve the performance of the query if 
> the main query index is very large. The "threads" parameter turns on 
> threading. For example *threads=6* will use 6 threads to build the filter. 
> This will setup a fixed threadpool with six threads to handle all hjoin 
> requests. Once the threadpool is created the hjoin will always use it to 
> build the filter. Threading does not come into play with the PostFilter.
> 3) The *size* local parameter can be used to set the initial size of the 
> hashset used to perform the join. If this is set above the number of results 
> from the fromIndex then the you can avoid hashset resizing which improves 
> performance.
> 4) Nested filter queries. The local parameter "fq" can be used to nest a 
> filter query within the join. The nested fq will filter the results of the 
> join query. This can point to another join to support nested joins.
> 5) Full caching support for the lucene query implementation. The filterCache 
> and queryResultCache should work properly even with deep nesting of joins. 
> Only the queryResultCache comes into play with the PostFilter implementation 
> because PostFilters are not cacheable in the filterCache.
> The syntax of the hjoin is similar to the JoinQParserPlugin except that the 
> plugin is referenced by the string "hjoin" rather then "join".
> fq=\{!hjoin fromIndex=collection2 from=id_i to=id_i threads=6 
> fq=$qq\}user:customer1&qq=group:5
> The example filter query above will search the fromIndex (collection2) for 
> "user:customer1" applying the local fq parameter to filter the results. The 
> lucene filter query will be built using 6 threads. This query will generate a 
> list of values from the "from" field that will be used to filter the main 
> query. Only records from the main query, where the "to" field is present in 
> the "from" list will be included in the results.
> The solrconfig.xml in the main query core must contain the reference to the 
> pjoin.
>  class="org.apache.solr.joins.HashSetJoinQParserPlugin"/>
> And the join contrib jars must be registed in the solrconfig.xml.
>  
>  
> *BitSetJoinQParserPlugin aka bjoin*
> The bjoin behaves exactly like the hjoin but uses a BitSet instead of a 
> HashSet to perform the underlying join. Because of this the bjoin is much 
> faster and can provide s

[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_40) - Build # 7687 - Still Failing!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7687/
Java: 64bit/jdk1.7.0_40 -XX:+UseCompressedOops -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 26231 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2477 (and 1385 related) class file(s) for forbidden 
API invocations (in 2.11s), 1 error(s).

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:421: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:67: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:285: Check 
for forbidden API calls failed, see log.

Total time: 46 minutes 27 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.7.0_40 -XX:+UseCompressedOops 
-XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #985: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/985/

No tests ran.

Build Log:
[...truncated 11706 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.7.0) - Build # 873 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/873/
Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 34766 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2477 (and 1385 related) class file(s) for forbidden 
API invocations (in 7.31s), 1 error(s).

BUILD FAILED
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:421: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/build.xml:67: The following 
error occurred while executing this line:
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build.xml:285: Check for 
forbidden API calls failed, see log.

Total time: 116 minutes 47 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.7.0 -XX:-UseCompressedOops -XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3069) Lucene should have an entirely memory resident term dictionary

2013-09-30 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782141#comment-13782141
 ] 

Simon Willnauer commented on LUCENE-3069:
-

nice one! I am happy that this one made it in 2.5 years after opening! Great 
work Han!!

> Lucene should have an entirely memory resident term dictionary
> --
>
> Key: LUCENE-3069
> URL: https://issues.apache.org/jira/browse/LUCENE-3069
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index, core/search
>Affects Versions: 4.0-ALPHA
>Reporter: Simon Willnauer
>Assignee: Han Jiang
>  Labels: gsoc2013
> Fix For: 5.0, 4.5
>
> Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, LUCENE-3069.patch, 
> LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

2013-09-30 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782136#comment-13782136
 ] 

Michael McCandless commented on LUCENE-5248:

I think the lack of RAM tracking in RALD is an important thing to fix,
separately from optimizing how RALD uses its RAM.  Especially the
spooky case where on update, consuming tiny amounts of RAM, can
resolve to millions of documents, consuming tons of RAM in RALD.

Today, there are three reasons why we
BufferedDeletesStream.applyDeletes, which "resolves" the Term/Query
passed to deleteDocuments/updateNumericDocValue:

  * We've hit IW's RAM buffer

  * We're opening a new NRT reader, and applyAllDeletes=true

  * A merge is kicking off

As things stand now, the first case will resolve the updates and move
them into RALD but not write them to disk, while the other two cases
will write them to disk and clear RALD's maps I think?  Maybe a simple
fix is to also write to disk in case 1?

But, if the segment is large, we can still have a big spike as we
populate those Maps with millions of docs worth of updates?

bq. I don't see a reason why we need to resolve the updates when we register 
them with RALD 

If we "resolve & move the updates to disk" as a single operation (ie,
fix the first case above), then I think we can just keep the logic in
BD, but have it immediately move the updates to disk, rather than
buffer them up in RALD, except in the "this segment is merging" case?

bq. When applying them in writeLiveDocs, we will manage multiple DocsEnums (one 
per NumericUpdate.term) and iterate them in order

I think this is a neat idea; though I would worry about the RAM
required for N DocsEnums where N is possibly quite large...

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-4.x-Linux (64bit/jdk1.8.0-ea-b106) - Build # 7594 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-4.x-Linux/7594/
Java: 64bit/jdk1.8.0-ea-b106 -XX:-UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 26524 lines...]
-check-forbidden-java-apis:
[forbidden-apis] WARNING: Reading class file in Java 8 format. This may cause 
problems!
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:416)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2480 (and 1374 related) class file(s) for forbidden 
API invocations (in 1.32s), 1 error(s).

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:427: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/build.xml:67: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-4.x-Linux/solr/build.xml:285: Check for 
forbidden API calls failed, see log.

Total time: 41 minutes 29 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 64bit/jdk1.8.0-ea-b106 -XX:-UseCompressedOops 
-XX:+UseParallelGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.7.0_40) - Build # 7686 - Failure!

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-Linux/7686/
Java: 32bit/jdk1.7.0_40 -client -XX:+UseConcMarkSweepGC

All tests passed

Build Log:
[...truncated 26242 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2477 (and 1385 related) class file(s) for forbidden 
API invocations (in 1.72s), 1 error(s).

BUILD FAILED
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:421: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:67: The following 
error occurred while executing this line:
/mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/solr/build.xml:285: Check 
for forbidden API calls failed, see log.

Total time: 51 minutes 49 seconds
Build step 'Invoke Ant' marked build as failure
Description set: Java: 32bit/jdk1.7.0_40 -client -XX:+UseConcMarkSweepGC
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC1 Release apache-solr-ref-guide-4.5.pdf

2013-09-30 Thread Chris Hostetter


: Subject: VOTE: RC1 Release apache-solr-ref-guide-4.5.pdf

: 
https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC1/
: 
: $ cat apache-solr-ref-guide-4.5.pdf.sha1
: 823859d10a794fc399afd12981c40b167c076b13  apache-solr-ref-guide-4.5.pdf


+1


-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782042#comment-13782042
 ] 

Joel Bernstein commented on SOLR-4816:
--

I've been doing some more testing of the error handling. If CloudSolrServer 
encounters an error from any of it's shards it gathers them up into a 
RouteException and throws. This exception extends SolrException and can be 
treated as such.

When getMessage is called, you see a typical update error returned:

ERROR: [doc=1100] Error adding field 'test_i'='bad' msg=For input string: "bad"

What we don't know is which server to check to get the full stack trace. But it 
doesn't seem like we knew that info in CloudSolrServer prior to this patch. 
RouteException, when made public, will tell us this. 

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5263) CloudSolrServer URL cache update race


[ 
https://issues.apache.org/jira/browse/SOLR-5263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13782033#comment-13782033
 ] 

Mark Miller commented on SOLR-5263:
---

If there are no objections, I'll commit this shortly.

> CloudSolrServer URL cache update race
> -
>
> Key: SOLR-5263
> URL: https://issues.apache.org/jira/browse/SOLR-5263
> Project: Solr
>  Issue Type: Bug
>  Components: clients - java, SolrCloud
>Affects Versions: 4.4, 4.5, 5.0
>Reporter: Jessica Cheng
>Assignee: Mark Miller
>  Labels: solrcloud, solrj
> Fix For: 5.0, 4.6
>
> Attachments: SOLR-5263.patch
>
>
> In CloudSolrServer.request, urlLists (and the like) is updated if 
> lastClusterStateHashCode is different from the current hash code of 
> clusterState. However, each time this happen, only the cache entry for the 
> current collection being requested is updated. In the following condition 
> this causes a race:
> query collection A so a cache entry exists
> update collection A
> query collection B, request method notices the hash code changed and update 
> cache for collection B, updates lastClusterStateHashCode
> query collection A, since lastClusterStateHashCode has been updated, no 
> update for cache for collection A even though it's stale
> Can fix one of two ways:
> 1. Track lastClusterStateHashCode per collection and lazily update each entry
> 2. Every time we notice lastClusterStateHashCode != clusterState.hashCode(),
>2a. rebuild the entire cache for all collections
>2b. clear all current cache for collections



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-trunk-Java7 - Build # 4362 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-Java7/4362/

All tests passed

Build Log:
[...truncated 26239 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.7
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.7
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2477 (and 1385 related) class file(s) for forbidden 
API invocations (in 2.04s), 1 error(s).

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:421:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/build.xml:67:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-Java7/solr/build.xml:285:
 Check for forbidden API calls failed, see log.

Total time: 66 minutes 33 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

VOTE: RC1 Release apache-solr-ref-guide-4.5.pdf

2013-09-30 Thread Chris Hostetter



(Big thanks to everyone who chipped in in reviewing & improving on RC0!)

Please vote to release the following artifacts (RC1) as the Apache Solr 
Reference Guide for 4.5...


https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.5-RC1/

$ cat apache-solr-ref-guide-4.5.pdf.sha1
823859d10a794fc399afd12981c40b167c076b13  apache-solr-ref-guide-4.5.pdf


(When reviewing the PDF, please don't hesitate to point out any typos or 
formatting glitches or any other problems of subject matter. Re-spinning a 
new RC is trivial, So in my opinion the bar is very low in terms of what 
things are worth fixing before relase.)







-Hoss

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: NumericRangeTermsEnum

2013-09-30 Thread Chet Vora

Mike

We want to use the lower level Terms API to create some custom high
performant filters ... is there any reason why the method
NumericRangeQuery.getTermsEnum() was made protected in the API as opposed
to public?

CV


On Fri, Sep 27, 2013 at 4:15 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> Normally you'd create a NumericRangeFilter/Query and just use that?
>
> Under the hood, Lucene uses that protected API to visit all matching
> terms...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Sep 26, 2013 at 9:59 AM, Chet Vora  wrote:
> > Hi all
> >
> > I was trying to use the above enum to do some range search on dates...
> this
> > enum is returned by NumericRangeQuery.getTermsEnum() but I realized that
> > this is a protected method of the class and since this is a final class,
> I
> > can't see how I can use it. Maybe I'm missing something ?
> >
> > Would appreciate any pointers.
> >
> > Thanks
> >
> > CV
>

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-30 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781991#comment-13781991
 ] 

Michael McCandless commented on LUCENE-5189:


bq.  I'd not be happy with this being released tomorrow.

Can you give more details here?  I.e., do you consider the optimizations 
necessary for release?  Or are there other things?

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 4.5.0 RC5

2013-09-30 Thread Yonik Seeley

+1

Thanks for all the patience with re-spins Adrien!

-Yonik
http://lucidworks.com

On Sat, Sep 28, 2013 at 12:59 PM, Adrien Grand  wrote:
> Please vote to release the following artifacts:
>   
> http://people.apache.org/~jpountz/staging_area/lucene-solr-4.5.0-RC5-rev1527178/
>
> Here is my +1.
>
> --
> Adrien

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-4.x-Java7 - Build # 1637 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java7/1637/

All tests passed

Build Log:
[...truncated 26448 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2480 (and 1372 related) class file(s) for forbidden 
API invocations (in 2.11s), 1 error(s).

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/build.xml:427:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/build.xml:67:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java7/solr/build.xml:285:
 Check for forbidden API calls failed, see log.

Total time: 68 minutes 50 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 4.5.0 RC5

2013-09-30 Thread Martijn v Groningen

+1 smoke tester is happy.


On 30 September 2013 14:37, Michael McCandless wrote:

> +1, smoke tester is happy.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Sat, Sep 28, 2013 at 12:59 PM, Adrien Grand  wrote:
> > Please vote to release the following artifacts:
> >
> http://people.apache.org/~jpountz/staging_area/lucene-solr-4.5.0-RC5-rev1527178/
> >
> > Here is my +1.
> >
> > --
> > Adrien
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


-- 
Met vriendelijke groet,

Martijn van Groningen

[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781966#comment-13781966
 ] 

Shai Erera commented on LUCENE-5248:

I thought about it some more ... perhaps there's a way to keep the updates 
without holding (almost) any RAM. Today, the code follows the delete path, by 
resolving the delete terms to the docIDs they affect. With deletes it's easy, 
there's a low RAM overhead.

I don't see a reason why we need to resolve the updates when we register them 
with RALD (but perhaps I'm overlooking something) as they aren't used 
in-memory. If a Reader needs to see them, we flush them to disk, unlike 
liveDocs which are shared in-memory and not flushed to disk. So perhaps we 
could keep in RALD a Map, a mapping from a field to all 
numeric updates. When applying them in writeLiveDocs, we will manage multiple 
DocsEnums (one per NumericUpdate.term) and iterate them in order, ensuring to 
apply the recent update to the document that is pointed in the current 
iteration. So if termA affects docs 1,3,6 and termB 2,3,5,6, we iterate on both 
and position termA on 1 and termB on 2. Since they don't match we return the 
update value for doc1. When both are position on doc3, we apply the update of 
termB (as it came last). doc4 is assumed as not updated and so forth.

This is definitely hairy, but consumes much less RAM. Also, in terms of 
performance I don't think that we lose anything, because it's not like we will 
resolve same updates multiple times (once per-segment, but that's done 
anyway?). It's just hairy code, but not sure how much more hairy than the 
changes to commitMergedDeletes would require with the proposal detailed above. 
For commitMergedDeletes, we won't need to resolve any updates, just record them 
in the merged RALD, they will be resolved (using the merged DocsEnum) when it's 
time to writeLiveDocs that segment?

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781937#comment-13781937
 ] 

Joel Bernstein commented on SOLR-4816:
--

I've been mulling over how big an issue this is. If you get a number of 
exceptions from different servers now, you'd only see one. You'd fix the data 
and refeed the batch. Then you'd see the next exception and you'd have to fix 
that. That's no different then how you'd have to work if you had multiple 
exceptions in a batch on single server. So, I don't think it's such large 
issue. I agree with Mark's plan.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates


[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781933#comment-13781933
 ] 

Shai Erera commented on LUCENE-5248:


Another option, maybe even unrelated to this issue (but relevant to field 
updates in general), is to take ReaderAndLiveDocs into IW's RAM accounting. 
Today it's irrelevant since it holds 1-bit per document, but field updates 
could hold many more (several longs). We could add to RLD.sizeInBytes, taking 
only the field updates into account, and writeLiveDocs if it exceeds the 
buffer. Thing is, flush-by-RAM, as far as I understand, does not involve 
flushing RALD, so this change may not be that simple. But I'll let Mike/Rob, 
who knows this code better than me, comment on that.

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

[ 
https://issues.apache.org/jira/browse/LUCENE-5248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781924#comment-13781924
 ] 

Shai Erera commented on LUCENE-5248:

I discussed briefly the details of the first data structure with Adrien and 
Rob, and here's a proposal:

* Conceptually hold an int[] and long[] arrays (for docs and values 
respectively), per field.
* When an update is applied to a document, we write an entry in the arrays, not 
bothering to update an existing array.
** E.g. if updates come to docs 1,2,1,2,3,1,2,3, then the arrays will hold:
*** docs: {{[1,2,1,2,3,1,2,3]}}
*** values: {{[5,4,1,3,5,6,2,9]}}
** So the result of the updates should be doc1=6, doc2=2 and doc3=9.
* In writeLiveDocs we stable-sort the two arrays and take the last value of a 
document. The sort will yield:
** docs: {{[1,1,1,2,2,2,3,3]}}
** values: {{[5,1,6,4,3,2,5,9]}}
** The Iterator will take the last value of each document
* To manage the data structure:
** FieldUpdates which holds the ints/longs, sorts and provides an iterator-like 
API, e.g. nextDoc()/nextValue() which takes the last value for each document.
** For docs, use PackedInts.getMutable (with 
bitsPerValue=PackedInts.bitsRequired(maxDoc - 1))
** For value, use GrowableWriter
** In ReaderAndLiveDocs, hold a Map: a per-field 
FieldUpdates instance.

As for the second structure, it's unrelated to the first (i.e. they can be 
improved separately), though still it suffers from same issues -- a single 
update that comes in while the segment is merging can affect millions of 
documents and therefore can be inefficient too. One way to solve it is to use 
the same structure mentioned above and manage multiple iterators in 
IW.commitMergedDeletes (what's a bit more hair to this already hairy code? ;)), 
so that for every document we "handle", we also iterate in parallel on all the 
fields.

I'll start with the first structure, and then if it works well, I'll try to 
apply it to the second. If you have comments/suggestions on how else to save 
the updates, feel free to propose.

> Improve the data structure used in ReaderAndLiveDocs to hold the updates
> 
>
> Key: LUCENE-5248
> URL: https://issues.apache.org/jira/browse/LUCENE-5248
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
>
> Currently ReaderAndLiveDocs holds the updates in two structures:
> +Map>+
> Holds a mapping from each field, to all docs that were updated and their 
> values. This structure is updated when applyDeletes is called, and needs to 
> satisfy several requirements:
> # Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
> in that order, and termA affects doc=100 and termB doc=2, then the updates 
> are applied in that order, meaning we cannot rely on updates coming in order.
> # Same document may be updated multiple times, either by same term (e.g. 
> several calls to IW.updateNDV) or by different terms. Last update wins.
> # Sequential read: when writing the updates to the Directory 
> (fieldsConsumer), we iterate on the docs in-order and for each one check if 
> it's updated and if not, pull its value from the current DV.
> # A single update may affect several million documents, therefore need to be 
> efficient w.r.t. memory consumption.
> +Map>+
> Holds a mapping from a document, to all the fields that it was updated in and 
> the updated value for each field. This is used by IW.commitMergedDeletes to 
> apply the updates that came in while the segment was merging. The 
> requirements this structure needs to satisfy are:
> # Access in doc order: this is how commitMergedDeletes works.
> # One-pass: we visit a document once (currently) and so if we can, it's 
> better if we know all the fields in which it was updated. The updates are 
> applied to the merged ReaderAndLiveDocs (where they are stored in the first 
> structure mentioned above).
> Comments with proposals will follow next.

--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-Tests-4.x-Java6 - Build # 2040 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-Tests-4.x-Java6/2040/

All tests passed

Build Log:
[...truncated 25828 lines...]
-check-forbidden-java-apis:
[forbidden-apis] Reading bundled API signatures: jdk-unsafe-1.6
[forbidden-apis] Reading bundled API signatures: jdk-deprecated-1.6
[forbidden-apis] Reading bundled API signatures: commons-io-unsafe-2.1
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/lucene/tools/forbiddenApis/base.txt
[forbidden-apis] Reading API signatures: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/lucene/tools/forbiddenApis/servlet-api.txt
[forbidden-apis] Loading classes to check...
[forbidden-apis] Scanning for API signatures and dependencies...
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.collation.ICUCollationKeyAnalyzer' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] Forbidden method invocation: 
org.apache.commons.io.IOUtils#toString(java.io.InputStream) [Uses default 
charset]
[forbidden-apis]   in org.apache.solr.client.solrj.impl.HttpSolrServer 
(HttpSolrServer.java:415)
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProviderFactory' cannot be loaded. Please 
fix the classpath!
[forbidden-apis] WARNING: The referenced class 
'org.apache.lucene.analysis.uima.ae.AEProvider' cannot be loaded. Please fix 
the classpath!
[forbidden-apis] Scanned 2480 (and 1369 related) class file(s) for forbidden 
API invocations (in 2.67s), 1 error(s).

BUILD FAILED
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:427:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/build.xml:67:
 The following error occurred while executing this line:
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-4.x-Java6/solr/build.xml:285:
 Check for forbidden API calls failed, see log.

Total time: 68 minutes 29 seconds
Build step 'Invoke Ant' marked build as failure
Archiving artifacts
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781914#comment-13781914
 ] 

Shai Erera commented on LUCENE-5189:


I opened LUCENE-5248 to handle the data structure (irrespective of whether we 
choose to backport first or not).

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5248) Improve the data structure used in ReaderAndLiveDocs to hold the updates

Shai Erera created LUCENE-5248:
--

 Summary: Improve the data structure used in ReaderAndLiveDocs to 
hold the updates
 Key: LUCENE-5248
 URL: https://issues.apache.org/jira/browse/LUCENE-5248
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/index
Reporter: Shai Erera
Assignee: Shai Erera


Currently ReaderAndLiveDocs holds the updates in two structures:

+Map>+
Holds a mapping from each field, to all docs that were updated and their 
values. This structure is updated when applyDeletes is called, and needs to 
satisfy several requirements:

# Un-ordered writes: if a field "f" is updated by two terms, termA and termB, 
in that order, and termA affects doc=100 and termB doc=2, then the updates are 
applied in that order, meaning we cannot rely on updates coming in order.
# Same document may be updated multiple times, either by same term (e.g. 
several calls to IW.updateNDV) or by different terms. Last update wins.
# Sequential read: when writing the updates to the Directory (fieldsConsumer), 
we iterate on the docs in-order and for each one check if it's updated and if 
not, pull its value from the current DV.
# A single update may affect several million documents, therefore need to be 
efficient w.r.t. memory consumption.

+Map>+
Holds a mapping from a document, to all the fields that it was updated in and 
the updated value for each field. This is used by IW.commitMergedDeletes to 
apply the updates that came in while the segment was merging. The requirements 
this structure needs to satisfy are:

# Access in doc order: this is how commitMergedDeletes works.
# One-pass: we visit a document once (currently) and so if we can, it's better 
if we know all the fields in which it was updated. The updates are applied to 
the merged ReaderAndLiveDocs (where they are stored in the first structure 
mentioned above).

Comments with proposals will follow next.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781909#comment-13781909
 ] 

Mark Miller commented on SOLR-4816:
---

Let's make a new issue to make these public in 4.6. The tests can access them 
because they are package private - they should be public and static.

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781889#comment-13781889
 ] 

Joel Bernstein commented on SOLR-4816:
--

There is a test case that uses the RouteResponse which has the same access 
level as RouteException, which works. It may be that this is because of the 
friendly package access. I'll test both RouteReponse and RouteException outside 
the package and see if they are accessible.

bq. Secondly, since the responseFutures are per URL, won't two update requests 
on the same server overwrite the entries?

When an exception is thrown from one of the routed servers the batch will 
terminate on that server. So each URL in RouteException should have only one 
exception.


> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781868#comment-13781868
 ] 

Shai Erera commented on LUCENE-5189:


I don't mind if we backport the whole thing in one commit. Just thought it will 
be cleaner to backport each issue's commits. I doubt anyone would "hit" an 
issue within the couple of hours it will take. But I'll do this in one backport.

bq. Only port stuff to the stable branch unless you'd be happy to release it 
tomorrow

I agree, though what if we decide to release 5.0 in one month? Do we revert the 
whole feature? I just think that it's software, and software always improves. 
Even if we optimize the way updates are kept (the problem is in 
ReaderAndLiveDocs), it can always be improved tomorrow even more. That's why 
the feature is marked @lucene.experimental -- it may not be the most optimized 
thing, but it works and more importantly - it doesn't affect users that don't 
use it ("do no harm").

I will look into improving the way updates are kept in RALD 
(Map>), though honestly, we have no data points as to 
whether it's efficient or not, or whether the new structure is more efficient. 
What I think we can do is keep the updates in conceptually an int[] and long[] 
pair arrays (maybe one of those **Buffer we have for better compression). I'll 
start w/ that.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781854#comment-13781854
 ] 

Shalin Shekhar Mangar commented on SOLR-4816:
-

bq. You would get an exception that you could trace to one of the servers. The 
exception itself would have to have the info needed to determine what document 
failed due to optimistic locking.

[~joel.bernstein] - I'm a little confused about how to tracking failures with 
CloudSolrServer on update requests. Firstly, the RouteException class is 
private to CloudSolrServer and cannot be used at all. Secondly, since the 
responseFutures are per URL, won't two update requests on the same server 
overwrite the entries?

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-4816) Add document routing to CloudSolrServer


[ 
https://issues.apache.org/jira/browse/SOLR-4816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781854#comment-13781854
 ] 

Shalin Shekhar Mangar edited comment on SOLR-4816 at 9/30/13 2:25 PM:
--

bq. You would get an exception that you could trace to one of the servers. The 
exception itself would have to have the info needed to determine what document 
failed due to optimistic locking.

[~joel.bernstein] - I'm a little confused about how to track failures with 
CloudSolrServer for update requests. Firstly, the RouteException class is 
private to CloudSolrServer and cannot be used at all. Secondly, since the 
responseFutures are per URL, won't two update requests on the same server 
overwrite the entries?


was (Author: shalinmangar):
bq. You would get an exception that you could trace to one of the servers. The 
exception itself would have to have the info needed to determine what document 
failed due to optimistic locking.

[~joel.bernstein] - I'm a little confused about how to tracking failures with 
CloudSolrServer on update requests. Firstly, the RouteException class is 
private to CloudSolrServer and cannot be used at all. Secondly, since the 
responseFutures are per URL, won't two update requests on the same server 
overwrite the entries?

> Add document routing to CloudSolrServer
> ---
>
> Key: SOLR-4816
> URL: https://issues.apache.org/jira/browse/SOLR-4816
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Affects Versions: 4.3
>Reporter: Joel Bernstein
>Assignee: Mark Miller
>Priority: Minor
> Fix For: 4.5, 5.0
>
> Attachments: RequestTask-removal.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, 
> SOLR-4816.patch, SOLR-4816.patch, SOLR-4816.patch, SOLR-4816-sriesenberg.patch
>
>
> This issue adds the following enhancements to CloudSolrServer's update logic:
> 1) Document routing: Updates are routed directly to the correct shard leader 
> eliminating document routing at the server.
> 2) Optional parallel update execution: Updates for each shard are executed in 
> a separate thread so parallel indexing can occur across the cluster.
> These enhancements should allow for near linear scalability on indexing 
> throughput.
> Usage:
> CloudSolrServer cloudClient = new CloudSolrServer(zkAddress);
> cloudClient.setParallelUpdates(true); 
> SolrInputDocument doc1 = new SolrInputDocument();
> doc1.addField(id, "0");
> doc1.addField("a_t", "hello1");
> SolrInputDocument doc2 = new SolrInputDocument();
> doc2.addField(id, "2");
> doc2.addField("a_t", "hello2");
> UpdateRequest request = new UpdateRequest();
> request.add(doc1);
> request.add(doc2);
> request.setAction(AbstractUpdateRequest.ACTION.OPTIMIZE, false, false);
> NamedList response = cloudClient.request(request); // Returns a backwards 
> compatible condensed response.
> //To get more detailed response down cast to RouteResponse:
> CloudSolrServer.RouteResponse rr = (CloudSolrServer.RouteResponse)response;



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-3530) better error messages / Content-Type validation in solrJ


 [ 
https://issues.apache.org/jira/browse/SOLR-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-3530.
---

Resolution: Fixed

> better error messages / Content-Type validation in solrJ
> 
>
> Key: SOLR-3530
> URL: https://issues.apache.org/jira/browse/SOLR-3530
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
>
> spin off from SOLR-3258, it would be helpful if SolrJ, when encountering 
> Exceptions from the ResponseParser (or perhaps before ever even handing data 
> to the ResponseParser) did some validation of the Content-Type returned by 
> the remote server to give better error messages in cases where miss 
> configuration has the wrong matchup between ResponseParser and mime-type (or 
> worse: an HTML page being returned by a non-solr server)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-5291) Solrj does not propagate the root cause to the user for many errors.


 [ 
https://issues.apache.org/jira/browse/SOLR-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller resolved SOLR-5291.
---

Resolution: Fixed

> Solrj does not propagate the root cause to the user for many errors.
> 
>
> Key: SOLR-5291
> URL: https://issues.apache.org/jira/browse/SOLR-5291
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
> Attachments: SOLR-5291.patch
>
>
> This is a frustrating little bug because it forces you to look at the logs 
> for any insight into what happened - the error message in the exception you 
> get with Solrj is very generic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-4.x #462: POMs out of sync

2013-09-30 Thread ASF subversion and git services (JIRA)

Build: https://builds.apache.org/job/Lucene-Solr-Maven-4.x/462/

No tests ran.

Build Log:
[...truncated 11872 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-30 Thread Simon Willnauer (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781810#comment-13781810
 ] 

Simon Willnauer commented on LUCENE-5189:
-

bq. I suppose we could also wait some more (the test did uncover a big issue 
recently), but I don't think we need to...

I think we should apply a general rule here that we (at least I believe so) in 
Lucene land had for a long time. ->> "Only port stuff to the stable branch 
unless you'd be happy to release it tomorrow." I looked at the change and 
honestly this looks pretty scary to me. I'd not be happy with this being 
released tomorrow. We can happily aim for this being in 4.6 but baking in 
should happen in trunk. This is a huge change and I am thankful for the work 
that has been done on it but I think we should not rush this into the stable 
branch. This should IMO bake in more in trunk and have several rounds of 
optimization to it in terms of datastructures before we go and release it.



> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3530) better error messages / Content-Type validation in solrJ


[ 
https://issues.apache.org/jira/browse/SOLR-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781802#comment-13781802
 ] 

ASF subversion and git services commented on SOLR-3530:
---

Commit 1527554 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1527554 ]

SOLR-5291: Solrj does not propagate the root cause to the user for many errors.
SOLR-3530: Better error messages / Content-Type validation in SolrJ.

> better error messages / Content-Type validation in solrJ
> 
>
> Key: SOLR-3530
> URL: https://issues.apache.org/jira/browse/SOLR-3530
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
>
> spin off from SOLR-3258, it would be helpful if SolrJ, when encountering 
> Exceptions from the ResponseParser (or perhaps before ever even handing data 
> to the ResponseParser) did some validation of the Content-Type returned by 
> the remote server to give better error messages in cases where miss 
> configuration has the wrong matchup between ResponseParser and mime-type (or 
> worse: an HTML page being returned by a non-solr server)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5291) Solrj does not propagate the root cause to the user for many errors.

2013-09-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781801#comment-13781801
 ] 

ASF subversion and git services commented on SOLR-5291:
---

Commit 1527554 from [~markrmil...@gmail.com] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1527554 ]

SOLR-5291: Solrj does not propagate the root cause to the user for many errors.
SOLR-3530: Better error messages / Content-Type validation in SolrJ.

> Solrj does not propagate the root cause to the user for many errors.
> 
>
> Key: SOLR-5291
> URL: https://issues.apache.org/jira/browse/SOLR-5291
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
> Attachments: SOLR-5291.patch
>
>
> This is a frustrating little bug because it forces you to look at the logs 
> for any insight into what happened - the error message in the exception you 
> get with Solrj is very generic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-09-30 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781800#comment-13781800
 ] 

Stefan Matheis (steffkes) commented on SOLR-5287:
-

Actually that should be pretty simple? A possibility to {{POST}} a File (or the 
content as string .. whatever is easier) to the endpoint on the server, which 
stores the file on disk or in ZK. Perhaps with an Option to perform an 
Core-Reload to get the new schema/configuration activated?

Then we can think about (more or less) fancy stuff in the UI .. something like 
a typical html {{}} w/o syntax highlighting and such .. up to a fancy 
wizard where you can select (predefined) parts of a schema/configuration and 
enable them by click :o

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781799#comment-13781799
 ] 

Robert Muir commented on LUCENE-5189:
-

I don't think we should do that: then tests maybe fail, users hit too many open 
files, hit exceptions when segments dont contain their field, etc, etc in 4.x 
and we say "sorry, 4.x is only temporarily broken: maybe you should try trunk 
for a more stable user experience?"


> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates

2013-09-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781797#comment-13781797
 ] 

Shai Erera commented on LUCENE-5189:


I wrote (somewhere, don't remember where) that I'd like to backport 
issue-by-issue. So here I backport the changes done in this issue, after that 
I'll backport 5215, 5216 and 5246. That way the commits to 4x will be 
associated with the proper issues. Do you see a problem w/ that strategy? I 
definitely intend to backport all the changes in one go, only do that in 
multiple commits, one per issue.

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5287) Allow at least solrconfig.xml and schema.xml to be edited via the admin screen

2013-09-30 Thread Erick Erickson (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5287?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781794#comment-13781794
 ] 

Erick Erickson commented on SOLR-5287:
--

[~steffkes] Nobody is saying this is a horrible idea, what's your vision for 
the endpoint needed to accomplish this?

> Allow at least solrconfig.xml and schema.xml to be edited via the admin screen
> --
>
> Key: SOLR-5287
> URL: https://issues.apache.org/jira/browse/SOLR-5287
> Project: Solr
>  Issue Type: Improvement
>  Components: Schema and Analysis, web gui
>Affects Versions: 4.5, 5.0
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>
> A user asking a question on the Solr list got me to thinking about editing 
> the main config files from the Solr admin screen. I chatted briefly with 
> [~steffkes] about the mechanics of this on the browser side, he doesn't see a 
> problem on that end. His comment is there's no end point that'll write the 
> file back.
> Am I missing something here or is this actually not a hard problem? I see a 
> couple of issues off the bat, neither of which seem troublesome.
> 1> file permissions. I'd imagine lots of installations will get file 
> permission exceptions if Solr tries to write the file out. Well, do a 
> chmod/chown.
> 2> screwing up the system maliciously or not. I don't think this is an issue, 
> this would be part of the admin handler after all.
> Does anyone have objections to the idea? And how does this fit into the work 
> that [~sar...@syr.edu] has been doing?
> I can imagine this extending to SolrCloud with a "push this to ZK" option or 
> something like that, perhaps not in V1 unless it's easy.
> Of course any pointers gratefully received. Especially ones that start with 
> "Don't waste your effort, it'll never work (or be accepted)"...
> Because what scares me is this seems like such an easy thing to do that would 
> be a significant ease-of-use improvement, so there _has_ to be something I'm 
> missing.
> So if we go forward with this we'll make this the umbrella JIRA, the two 
> immediate sub-JIRAs that spring to mind will be the UI work and the endpoints 
> for the UI work to use.
> I think there are only two end-points here
> 1> list all the files in the conf (or arbitrary from /collection) 
> directory.
> 2> write this text to this file
> Possibly later we could add "clone the configs from coreX to coreY".
> BTW, I've assigned this to myself so I don't lose it, but if anyone wants to 
> take it over it won't hurt my feelings a bit



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3530) better error messages / Content-Type validation in solrJ


[ 
https://issues.apache.org/jira/browse/SOLR-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781793#comment-13781793
 ] 

ASF subversion and git services commented on SOLR-3530:
---

Commit 1527547 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1527547 ]

SOLR-5291: Solrj does not propagate the root cause to the user for many errors.
SOLR-3530: Better error messages / Content-Type validation in SolrJ.

> better error messages / Content-Type validation in solrJ
> 
>
> Key: SOLR-3530
> URL: https://issues.apache.org/jira/browse/SOLR-3530
> Project: Solr
>  Issue Type: Improvement
>Reporter: Hoss Man
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
>
> spin off from SOLR-3258, it would be helpful if SolrJ, when encountering 
> Exceptions from the ResponseParser (or perhaps before ever even handing data 
> to the ResponseParser) did some validation of the Content-Type returned by 
> the remote server to give better error messages in cases where miss 
> configuration has the wrong matchup between ResponseParser and mime-type (or 
> worse: an HTML page being returned by a non-solr server)



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5291) Solrj does not propagate the root cause to the user for many errors.

2013-09-30 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781792#comment-13781792
 ] 

ASF subversion and git services commented on SOLR-5291:
---

Commit 1527547 from [~markrmil...@gmail.com] in branch 'dev/trunk'
[ https://svn.apache.org/r1527547 ]

SOLR-5291: Solrj does not propagate the root cause to the user for many errors.
SOLR-3530: Better error messages / Content-Type validation in SolrJ.

> Solrj does not propagate the root cause to the user for many errors.
> 
>
> Key: SOLR-5291
> URL: https://issues.apache.org/jira/browse/SOLR-5291
> Project: Solr
>  Issue Type: Bug
>Reporter: Mark Miller
>Assignee: Mark Miller
> Fix For: 5.0, 4.6
>
> Attachments: SOLR-5291.patch
>
>
> This is a frustrating little bug because it forces you to look at the logs 
> for any insight into what happened - the error message in the exception you 
> get with Solrj is very generic.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5189) Numeric DocValues Updates


[ 
https://issues.apache.org/jira/browse/LUCENE-5189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13781788#comment-13781788
 ] 

Robert Muir commented on LUCENE-5189:
-

I'm only confused by the strategy of the backport: I see things like 
SegmentWriteState.isFieldUpdate here that were fixed in other issues.

Is this intentional? If the argument is to get user testing, I think to the 
user this would actually be "unbaking" if such a term exists...?

Can you svn merge the revs to LUCENE-5215 and any other issues (e.g. bugfixes) 
so we have one coherent backport?

> Numeric DocValues Updates
> -
>
> Key: LUCENE-5189
> URL: https://issues.apache.org/jira/browse/LUCENE-5189
> Project: Lucene - Core
>  Issue Type: New Feature
>  Components: core/index
>Reporter: Shai Erera
>Assignee: Shai Erera
> Attachments: LUCENE-5189-4x.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, LUCENE-5189.patch, 
> LUCENE-5189.patch, LUCENE-5189.patch
>
>
> In LUCENE-4258 we started to work on incremental field updates, however the 
> amount of changes are immense and hard to follow/consume. The reason is that 
> we targeted postings, stored fields, DV etc., all from the get go.
> I'd like to start afresh here, with numeric-dv-field updates only. There are 
> a couple of reasons to that:
> * NumericDV fields should be easier to update, if e.g. we write all the 
> values of all the documents in a segment for the updated field (similar to 
> how livedocs work, and previously norms).
> * It's a fairly contained issue, attempting to handle just one data type to 
> update, yet requires many changes to core code which will also be useful for 
> updating other data types.
> * It has value in and on itself, and we don't need to allow updating all the 
> data types in Lucene at once ... we can do that gradually.
> I have some working patch already which I'll upload next, explaining the 
> changes.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3530) better error messages / Content-Type validation in solrJ