date:20140429

[jira] [Closed] (SOLR-6032) NgramFilter dont keep token less than mingram size or greater than maxgram size

2014-04-29 Thread Kuntal Ganguly (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuntal Ganguly closed SOLR-6032.


Resolution: Duplicate

> NgramFilter dont keep token less than mingram size or greater than maxgram 
> size
> ---
>
> Key: SOLR-6032
> URL: https://issues.apache.org/jira/browse/SOLR-6032
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.2.1, 4.6.1
> Environment: Ubuntu12.04,4GB RAM, Quadcore Processor
>Reporter: Kuntal Ganguly
>  Labels: build, patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have a requirement for partial and exact type.Now partial search work fine 
> for NgramFilter within mingram & maxgram size range. Now when im trying to 
> index a value less than mingram size,the tokens are not generated .Same 
> things happens when the value is greater than maxgramsize.
> I haveto created a field type as shown below:
>  positionIncrementGap="100">
> 
>   
>   
>   
>   
>   
>   
>maxGramSize="6" preserveOriginal="true"/>
>   
> 
> when i'm trying to index a value say AB (it is not indexed and not 
> searchable). Similarly if the value is GangulyKuntal (which is greater than 
> maxgram size),the search is not working.
> **Increasing maxgram size to more than the anticipated value is not good 
> design aspect.
> NgramFilter should keep the original tokens if it is less than mingram or 
> greater than maxgram. By doing this it will make it truly partial as well as 
> exact search solution.It would really be very helpful,if this changes are 
> made in the coming release. Any suggestion will be of great help?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-6033) LeftOuter Join capabilty in SOLR and dynamic field merge in response

2014-04-29 Thread Kuntal Ganguly (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kuntal Ganguly closed SOLR-6033.


Resolution: Won't Fix

> LeftOuter Join capabilty in SOLR and dynamic field merge in response
> 
>
> Key: SOLR-6033
> URL: https://issues.apache.org/jira/browse/SOLR-6033
> Project: Solr
>  Issue Type: New Feature
>  Components: documentation, search
>Affects Versions: 4.2.1, 4.3, 4.5.1, 4.6.1
> Environment: RedHat Linux, 6GB Ram, Core2Duo Processor
>Reporter: Kuntal Ganguly
>  Labels: build, feature, patch
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I'm having different kind of entity in the index.
> Entitity-1: id, doc_name, type, pinprojectid, documentid, content
> Entity-2: id, proj_name,projtype,type,pinprojectid
> where type is unique for every different entity e.g. Entity-1(type=Documents) 
> & Entity-2(type=Projects).pinprojectid is common between two Entity.
> Now im trying to search on type:Document AND content:"hello",
> but the result do left outer join with Entity-2 based on join field say ( 
> pinprojectid) and fetches few information like(projtype,proj_name) and 
> display in the Entity-1 response.
> Say Entity-1 search gives 12 result,but left-outer join field fetch matches 
> with 10 result.
> So the final output should be 12 with 10 doc containing extra merge fields 
> through leftouter join.
> This is very common in SQL.
> One way to do this to process from client side with two separate call to SOLR 
> server.But this functaility or enhancement or added feature needs to there in 
> solr release in a generalized way.
> Let me know if there is any other way to achieve the above scenario from 
> server side of Solr in one call??



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-04-29 Thread Anshum Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-5681:
---

Attachment: SOLR-5681.patch

Updated patch, though it has some failing tests.

> Make the OverseerCollectionProcessor multi-threaded
> ---
>
> Key: SOLR-5681
> URL: https://issues.apache.org/jira/browse/SOLR-5681
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
> Attachments: SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, 
> SOLR-5681.patch, SOLR-5681.patch
>
>
> Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
> anything long running would have it block processing of other mutually 
> exclusive tasks.
> When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
> truly non-blocking behavior by multi-threading the OCP itself.
> For example, a ShardSplit call on Collection1 would block the thread and 
> thereby, not processing a create collection task (which would stay queued in 
> zk) though both the tasks are mutually exclusive.
> Here are a few of the challenges:
> * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
> easy way to handle that is to only let 1 task per collection run at a time.
> * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
> The task from the workQueue is only removed on completion so that in case of 
> a failure, the new Overseer can re-consume the same task and retry. A queue 
> is not the right data structure in the first place to look ahead i.e. get the 
> 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
> which are not at the head of a queue is not really an 'intuitive' thing.
> Proposed solutions for task management:
> * Task funnel and peekAfter(): The parent thread is responsible for getting 
> and passing the request to a new thread (or one from the pool). The parent 
> method uses a peekAfter(last element) instead of a peek(). The peekAfter 
> returns the task after the 'last element'. Maintain this request information 
> and use it for deleting/cleaning up the workQueue.
> * Another (almost duplicate) queue: While offering tasks to workQueue, also 
> offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
> soon as a task from this is picked up for processing by the thread, it's 
> removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5981) Please change method visibility of getSolrWriter in DataImportHandler to public (or at least protected)

2014-04-29 Thread Shawn Heisey (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985143#comment-13985143
 ] 

Shawn Heisey commented on SOLR-5981:


I haven't looked at the other patch, but James understands the code a lot 
better than I do. I would expect that his patch is better.

> Please change method visibility of getSolrWriter in DataImportHandler to 
> public (or at least protected)
> ---
>
> Key: SOLR-5981
> URL: https://issues.apache.org/jira/browse/SOLR-5981
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
> Environment: Linux 3.13.9-200.fc20.x86_64
> Solr 4.6.0
>Reporter: Aaron LaBella
>Assignee: Shawn Heisey
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5981.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I've been using the org.apache.solr.handler.dataimport.DataImportHandler for 
> a bit and it's an excellent model and architecture.  I'd like to extend the 
> usage of it to plugin my own DIHWriter, but, the code doesn't allow for it.  
> Please change ~line 227 in the DataImportHander class to be:
> public SolrWriter getSolrWriter
> instead of:
> private SolrWriter getSolrWriter
> or, at a minimum, protected, so that I can extend DataImportHandler and 
> override this method.
> Thank you *sincerely* in advance for the quick turn-around on this.  If the 
> change can be made in 4.6.0 and upstream, that'd be ideal.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6034) Use a wildcard in order to delete fields with Atomic Update

2014-04-29 Thread Alexandre Rafalovitch (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985123#comment-13985123
 ] 

Alexandre Rafalovitch commented on SOLR-6034:
-

I don't think this issue is a DataImportHandler component. Though I am not sure 
which component this should be.

> Use a wildcard in order to delete fields with Atomic Update
> ---
>
> Key: SOLR-6034
> URL: https://issues.apache.org/jira/browse/SOLR-6034
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.7
>Reporter: Constantin Muraru
>
> As discussed on the SOLR user group, it would a great feature to be able to 
> remove all fields matching a pattern, using Atomic Updates.
> Example:
> 
>   100
>   
> 
> The *_day_i should be expanded server-side and all fields matching this 
> pattern should be removed from the specified document.
> Workaround: When removing fields from a document, we can make a query to SOLR 
> from the client, in order to see what fields are actually present for the 
> specific document. After that, we can create the XML update document to be 
> sent to SOLR. However, this is going to increase the number of queries to 
> SOLR and for large amount of documents this is going to weigh pretty much. It 
> would be great performance-wise and simplicity-wise to be able to provide 
> wildcards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Simple unit test doco?

2014-04-29 Thread Greg Pendlebury

Thank-you for the tips. I still haven't had a single run pass out of around
20 attempts now on both the server and my desktop, but I'll just keep
chipping away at it.

Ta,
Greg


On 29 April 2014 14:00, Tomás Fernández Löbbe  wrote:

> Many times cloud-related/distrib tests fail due to timeouts, this could be
> related to the overall load of your computer (probably generated by the
> tests itself). I don’t know if this is the correct way, but I found it that
> it’s much less probable for them to fail if I use less JVMs to run the
> tests (by default my mac would use 4, but I set it to 2 if I see failures.
> You can use the JVM parameter "tests.jvms" when running ant test)
>
> If you are working on some specific component you can filter which tests
> to run in many ways, see “ant test-help”. It may be useful to use
> tests.slow=false to skip the slow tests in most of your runs.
>
> "do I need to turn on a ZK server for integration testing?”
> No, you don’t. Solr will start an embedded Zookeeper for the tests.
>
> "I've tried running those tests in isolation via IntelliJ and they all
> report as passing”
> Most probably is not related to this, but just in case: make sure when you
> try to reproduce a failure on a test that you saw to use the same seed
> (-Dtests.seed). The seed used should be in the output of the test where you
> saw the failure.
>
>
>[junit4] Tests with failures:
>[junit4]   - org.apache.solr.hadoop.MorphlineMapperTest (suite)
>[junit4]
>
> Sorry, no idea about this one.
>
>
> On Mon, Apr 28, 2014 at 7:47 PM, Greg Pendlebury <
> greg.pendleb...@gmail.com> wrote:
>
>> Heyo,
>>
>> I'm wondering if there is any additional doco and/or tricks to unit
>> testing solr than this wiki page? http://wiki.apache.org/solr/TestingSolr
>>
>> Some details about my troubles are below if anyone cares to read, but I'm
>> not so much looking for specific responses to why individual tests are
>> failing. I'm more trying to work out whether I'm on the right track or
>> missing some key information... like do I need to turn on a ZK server for
>> integration testing?
>>
>> Or do I need to accept failed unit tests as a baseline before applying
>> our patch? I don't typically like that, but this is an enormous test suite
>> and I'd be happy just to get a pass up to the same level that 4.7.2 had
>> prior to release.
>>
>> Ta,
>> Greg
>>
>>
>> Details
>> ==
>> I downloaded the tagged 4.7.2 release Yesterday to apply a patch our team
>> wants to test, but even before touching the codebase at all I cannot get
>> the unit tests to pass. I'm struggling to even get consistent results.
>>
>> The most useful two end points I reach are:
>>[junit4] Tests with failures:
>>[junit4]   -
>> org.apache.solr.cloud.CustomCollectionTest.testDistribSearch
>>[junit4]   -
>> org.apache.solr.cloud.DistribCursorPagingTest.testDistribSearch
>>[junit4]   - org.apache.solr.cloud.DistribCursorPagingTest (suite)
>>[junit4]
>> ...
>>[junit4] Execution time total: 2 hours 6 minutes 50 seconds
>>[junit4] Tests summary: 365 suites, 1570 tests, 1 suite-level error, 2
>> errors, 187 ignored (12 assumptions)
>>
>> And another one (don't have the terminal output on hand unfortunately) in
>> the cloudera morphline suite. It is the same error as this though and fails
>> after around an hour:
>> http://mail-archives.apache.org/mod_mbox/flume-dev/201310.mbox/%3ccac6yyrj2cv89hntdeel7t0qlq8zjbwjynbtcveucxlzdmyv...@mail.gmail.com%3E
>>
>> I've tried running those tests in isolation via IntelliJ and they all
>> report as passing... the logs show exceptions about ZK session expiry for
>> some (not all) but I assume those are trapped expected exceptions since
>> JUnit is passing them?
>>
>> Given the response in the message I linked just above re: windows support
>> I tried shifting the build up to a RHEL6 server this morning but I've tried
>> two runs now and both failed with this odd error:
>>[junit4] Tests with failures:
>>[junit4]   - org.apache.solr.hadoop.MorphlineMapperTest (suite)
>>[junit4]
>> ...
>>[junit4] Execution time total: 42 seconds
>>[junit4] Tests summary: 7 suites, 35 tests, 2 suite-level errors, 5
>> ignored
>>
>> I only say odd because they run for half an hour and then report 42
>> seconds.
>>
>> Thanks again if you've read all this.
>>
>
>

[JENKINS] Lucene-Solr-trunk-MacOSX (64bit/jdk1.8.0) - Build # 1537 - Failure!

2014-04-29 Thread Policeman Jenkins Server

Build: http://jenkins.thetaphi.de/job/Lucene-Solr-trunk-MacOSX/1537/
Java: 64bit/jdk1.8.0 -XX:-UseCompressedOops -XX:+UseParallelGC

All tests passed

Build Log:
[...truncated 10805 lines...]
   [junit4] JVM J0: stderr was not empty, see: 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp/junit4-J0-20140429_214136_861.syserr
   [junit4] >>> JVM J0: stderr (verbatim) 
   [junit4] java(214,0x13cbd6000) malloc: *** error for object 0x116f0: 
pointer being freed was not allocated
   [junit4] *** set a breakpoint in malloc_error_break to debug
   [junit4] <<< JVM J0: EOF 

[...truncated 1 lines...]
   [junit4] ERROR: JVM J0 ended with an exception, command line: 
/Library/Java/JavaVirtualMachines/jdk1.8.0_05.jdk/Contents/Home/jre/bin/java 
-XX:-UseCompressedOops -XX:+UseParallelGC -XX:+HeapDumpOnOutOfMemoryError 
-XX:HeapDumpPath=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/heapdumps 
-Dtests.prefix=tests -Dtests.seed=82540EB0BA2BFDCA -Xmx512M -Dtests.iters= 
-Dtests.verbose=false -Dtests.infostream=false -Dtests.codec=random 
-Dtests.postingsformat=random -Dtests.docvaluesformat=random 
-Dtests.locale=random -Dtests.timezone=random -Dtests.directory=random 
-Dtests.linedocsfile=europarl.lines.txt.gz -Dtests.luceneMatchVersion=5.0 
-Dtests.cleanthreads=perClass 
-Djava.util.logging.config.file=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/logging.properties
 -Dtests.nightly=false -Dtests.weekly=false -Dtests.monster=false 
-Dtests.slow=true -Dtests.asserts.gracious=false -Dtests.multiplier=1 
-DtempDir=. -Djava.io.tmpdir=. 
-Djunit4.tempDir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/test/temp
 
-Dclover.db.dir=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/clover/db
 -Djava.security.manager=org.apache.lucene.util.TestSecurityManager 
-Djava.security.policy=/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/tools/junit4/tests.policy
 -Dlucene.version=5.0-SNAPSHOT -Djetty.testMode=1 -Djetty.insecurerandom=1 
-Dsolr.directoryFactory=org.apache.solr.core.MockDirectoryFactory 
-Djava.awt.headless=true -Djdk.map.althashing.threshold=0 
-Dtests.leaveTemporary=false -Dtests.filterstacks=true -Dtests.disableHdfs=true 
-classpath 
/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/test:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/test-framework/lib/junit4-ant-2.1.3.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/src/test-files:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/test-framework/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-solrj/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/build/solr-core/classes/java:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/common/lucene-analyzers-common-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/kuromoji/lucene-analyzers-kuromoji-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/analysis/phonetic/lucene-analyzers-phonetic-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/codecs/lucene-codecs-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/highlighter/lucene-highlighter-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/memory/lucene-memory-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/misc/lucene-misc-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/spatial/lucene-spatial-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/expressions/lucene-expressions-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/suggest/lucene-suggest-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/grouping/lucene-grouping-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queries/lucene-queries-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/queryparser/lucene-queryparser-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/lucene/build/join/lucene-join-5.0-SNAPSHOT.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/antlr-runtime-3.5.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/asm-commons-4.1.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-cli-1.2.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-codec-1.9.jar:/Users/jenkins/workspace/Lucene-Solr-trunk-MacOSX/solr/core/lib/commons-configuration-1.6.jar:/Users/jenk

[jira] [Resolved] (LUCENE-5611) Simplify the default indexing chain

2014-04-29 Thread Michael McCandless (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless resolved LUCENE-5611.


Resolution: Fixed

> Simplify the default indexing chain
> ---
>
> Key: LUCENE-5611
> URL: https://issues.apache.org/jira/browse/LUCENE-5611
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5611.patch, LUCENE-5611.patch
>
>
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs.  Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5611) Simplify the default indexing chain

2014-04-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984849#comment-13984849
 ] 

ASF subversion and git services commented on LUCENE-5611:
-

Commit 1591116 from [~mikemccand] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1591116 ]

LUCENE-5611: simplify the default indexing chain

> Simplify the default indexing chain
> ---
>
> Key: LUCENE-5611
> URL: https://issues.apache.org/jira/browse/LUCENE-5611
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5611.patch, LUCENE-5611.patch
>
>
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs.  Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6034) Use a wildcard in order to delete fields with Atomic Update

2014-04-29 Thread Constantin Muraru (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984754#comment-13984754
 ] 

Constantin Muraru commented on SOLR-6034:
-

Let me provide a better example. Suppose we have documents like this:


100
1
5
7


The schema looks the usual way:

The dynamic field pattern I'm using is this: id_day_i.

Each day I want to add new fields for the current day and remove the fields for 
the oldest one.


  100

  
  25

  
  1


The problem is, I don't know the exact names of the fields I want to remove. 
All I know is that they end in *_1600_i. It is not currently possible to 
specify a wildcard when sending an atomic update to SOLR.

When removing fields from a document, I want to avoid querying SOLR from the 
client, in order to see what fields are actually present for the specific 
document. In this way, hopefully I can speed up the process. Querying to see 
the schema.xml is not going to help me much, since the field is defined a 
dynamic field *_i. This makes me think that expanding the documents client-side 
is not the best way to do it.

> Use a wildcard in order to delete fields with Atomic Update
> ---
>
> Key: SOLR-6034
> URL: https://issues.apache.org/jira/browse/SOLR-6034
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.7
>Reporter: Constantin Muraru
>
> As discussed on the SOLR user group, it would a great feature to be able to 
> remove all fields matching a pattern, using Atomic Updates.
> Example:
> 
>   100
>   
> 
> The *_day_i should be expanded server-side and all fields matching this 
> pattern should be removed from the specified document.
> Workaround: When removing fields from a document, we can make a query to SOLR 
> from the client, in order to see what fields are actually present for the 
> specific document. After that, we can create the XML update document to be 
> sent to SOLR. However, this is going to increase the number of queries to 
> SOLR and for large amount of documents this is going to weigh pretty much. It 
> would be great performance-wise and simplicity-wise to be able to provide 
> wildcards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6034) Use a wildcard in order to delete fields with Atomic Update

2014-04-29 Thread Constantin Muraru (JIRA)

Constantin Muraru created SOLR-6034:
---

 Summary: Use a wildcard in order to delete fields with Atomic 
Update
 Key: SOLR-6034
 URL: https://issues.apache.org/jira/browse/SOLR-6034
 Project: Solr
  Issue Type: Improvement
  Components: contrib - DataImportHandler
Affects Versions: 4.7
Reporter: Constantin Muraru


As discussed on the SOLR user group, it would a great feature to be able to 
remove all fields matching a pattern, using Atomic Updates.

Example:

  100
  


The *_day_i should be expanded server-side and all fields matching this pattern 
should be removed from the specified document.

Workaround: When removing fields from a document, we can make a query to SOLR 
from the client, in order to see what fields are actually present for the 
specific document. After that, we can create the XML update document to be sent 
to SOLR. However, this is going to increase the number of queries to SOLR and 
for large amount of documents this is going to weigh pretty much. It would be 
great performance-wise and simplicity-wise to be able to provide wildcards.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-5386) Solr hangs on spellcheck.maxCollationTries

2014-04-29 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer closed SOLR-5386.


Resolution: Not a Problem

user said this is not a problem anymore, so closing

> Solr hangs on spellcheck.maxCollationTries
> --
>
> Key: SOLR-5386
> URL: https://issues.apache.org/jira/browse/SOLR-5386
> Project: Solr
>  Issue Type: Bug
>  Components: spellchecker
>Affects Versions: 4.4, 4.5
>Reporter: Jeroen Steggink
>  Labels: collate, maxCollationTries, spellcheck
>
> When spellcheck.maxCollationTries is set (>0) Solr hangs in combination with 
> that requestHandler set to default="true".
> When I make another requestHandler default, one without the 
> maxCollationTries, all requestHandlers work just fine.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5981) Please change method visibility of getSolrWriter in DataImportHandler to public (or at least protected)

2014-04-29 Thread James Dyer (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984700#comment-13984700
 ] 

James Dyer commented on SOLR-5981:
--

Aaron,

Please try the up-to-date patch I just attached to SOLR-3671.  I believe this 
is a cleaner approach to your goal.  With SOLR-3671 applied, all you have to do 
is create a DIHWriter implementation, then specify "writerImpl=classname" on 
your request (see oas.handler.dataimport.TestWriterImpl for a concrete 
example).  It should write the documents created by DIH to your custom Writer 
rather than to the default SolrWriter.

If the fix on SOLR-3671 meets your needs, then we can commit that rather than 
this one.  If it doesn't please clearly explain why SOLR-3671 is inadequate.  
Thanks!

> Please change method visibility of getSolrWriter in DataImportHandler to 
> public (or at least protected)
> ---
>
> Key: SOLR-5981
> URL: https://issues.apache.org/jira/browse/SOLR-5981
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0
> Environment: Linux 3.13.9-200.fc20.x86_64
> Solr 4.6.0
>Reporter: Aaron LaBella
>Assignee: Shawn Heisey
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-5981.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I've been using the org.apache.solr.handler.dataimport.DataImportHandler for 
> a bit and it's an excellent model and architecture.  I'd like to extend the 
> usage of it to plugin my own DIHWriter, but, the code doesn't allow for it.  
> Please change ~line 227 in the DataImportHander class to be:
> public SolrWriter getSolrWriter
> instead of:
> private SolrWriter getSolrWriter
> or, at a minimum, protected, so that I can extend DataImportHandler and 
> override this method.
> Thank you *sincerely* in advance for the quick turn-around on this.  If the 
> change can be made in 4.6.0 and upstream, that'd be ideal.
> Thanks!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-3671) DIH doesn't use its own interface + writerImpl has no information about the request

2014-04-29 Thread James Dyer (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-3671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer updated SOLR-3671:
-

Attachment: SOLR-3671.patch

Updated patch.  All tests pass.

> DIH doesn't use its own interface + writerImpl has no information about the 
> request
> ---
>
> Key: SOLR-3671
> URL: https://issues.apache.org/jira/browse/SOLR-3671
> Project: Solr
>  Issue Type: Improvement
>  Components: contrib - DataImportHandler
>Affects Versions: 4.0-ALPHA, 4.0-BETA
>Reporter: Roman Chyla
>Assignee: James Dyer
>Priority: Minor
> Attachments: SOLR-3671.patch, SOLR-3671.patch
>
>
> The use case: I would like to extend DIH by providing a new writer, I have 
> tried everything but can't accomplish it without either a) duplicating whole 
> DIHandler or b) java reflection tricks. Almost everything inside DIH is 
> private and the mechanism to instantiate a new writer based on the 
> 'writerImpl' mechanism seems lacking important functionality
> It doesn't give the new class a chance to get information about the request, 
> update processor. Also, the writer is instantiated twice (when 'writerImpl' 
> is there), which is really unnecessary.
> As a solution, the existing DIHandler.getSolrWriter() should instantiate the 
> appropriate writer and send it to DocBuilder (it is already doing that for 
> SolrWriter). And DocBuilder doesn't need to create a second (duplicate) writer



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: yet another one (mad) approach for join

2014-04-29 Thread Mikhail Khludnev

Let me challenge it another way.

If we look at Lucene JoinUtil, it works in two phases:
- looping "from" side docs, getting a term by docNum via
DocValues/FieldCache
- searching those term texts in "to" side field;

intermediate term texts seems like a really inconvenient thing.
What if we have a numeric docValue field contains parent's docNum. Thus, we
need just loop "from" side docs, getting parent docNum and drop them into a
bitset. It seems like a low hanging fruit, and it can get over JoinUtil
already.

the miserable question is how to index a parent docnum into a docvalue
field. So, far we can do that with updatable DocValues. Thanks to Shai!¡!

I need to know what you think so, far.

But coming back to block-join, it's noticeable that it has uber-performance
hint - it can leap-frop from parent to child and vice-versa. Hence if the
join query above is intersected with a highly selective "to"-side filter -
we waste resources to calculate whole "to"-side bitset. Sad.

Let's go deeper, limiting by the single value join case, aka 1:N (to:from).
Let's write a sequence of related docNums into binary DocValues field. How
to index it? - we have Binary DocValues updates! Thanks to Shai!¡!

That's what I have, so far: I can loop "to"-side filter, put sequences of
related docs into a heap and then, this heap is intersected with
"from"-side filter. As a result, I expect to get an another one join query
which is between join and block join in terms of query time, and also
between them in terms of cost of update.

WDYT about both of algorithms?

On Thu, Feb 13, 2014 at 9:09 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Thu, Feb 13, 2014 at 11:49 AM, Mikhail Khludnev
>  wrote:
> > Mike,
> > Thanks for the clue. It raises a lot of questions:
> >  - Is this cost caused by random access nature of the seekCeil() /** The
> > target term may be before or after the current term. */? Is there any
> chance
> > to make it more efficient by requesting "forward only" TermEnum?
>
> Maybe we could get some gains with a "forward only" mode ... not sure.
>  The enum already shares state, i.e. it checks for the common prefix
> b/w the term it's on now and the term you're seeking to.
>
> But also traversing the FST is not cheap.
>
> >  - will it be faster with 'entirely memory residend term dictionary'?
>
> Likely ...
>
> >  - or the overall idea of using TermEnum just complies with the sub, and
> > it's worth to experiment with writing the previous parent docnum (or
> current
> > block size) in payload and reading it when we need to jump back on
> > advance()?
>
> Maybe?
>
> But I think using FBS is an OK solution too... that's a very fast way
> to find prev/next parent.
>
> >  - once again, would you mind to remind why making DocEnum capable to
> jump
> > back is so hard? Can you recommend any starting point for hacking?
>
> Well, all the impls are heavily "forward only", e.g. we store the doc
> deltas in an int[128] and sum as we go.
>
> But anything we can do to improve block join, or query time join,
> would be great. E.g. for query time join I think this issue would be a
> big win: https://issues.apache.org/jira/browse/LUCENE-4771
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

[jira] [Updated] (SOLR-5656) Add SharedFS Failover option that allows surviving Solr instances to take over serving data for victim Solr instances.

2014-04-29 Thread Mark Miller (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-5656:
--

Description: 
When using HDFS, the Overseer should have the ability to reassign the cores 
from failed nodes to running nodes.

Given that the index and transaction logs are in hdfs, it's simple for 
surviving hardware to take over serving cores for failed hardware.

There are some tricky issues around having the Overseer handle this for you, 
but seems a simple first pass is not too difficult.

This will add another alternative to replicating both with hdfs and solr.

It shouldn't be specific to hdfs, and would be an option for any shared file 
system Solr supports.

  was:
Given that the index and transaction logs are in hdfs, it's simple for 
surviving hardware to take over serving cores for failed hardware.

There are some tricky issues around having the Overseer handle this for you, 
but seems a simple first pass is not too difficult.

This will add another alternative to replicating both with hdfs and solr.

Summary: Add SharedFS Failover option that allows surviving Solr 
instances to take over serving data for victim Solr instances.  (was: When 
using HDFS, the Overseer should have the ability to reassign the cores from 
failed nodes to running nodes.)

> Add SharedFS Failover option that allows surviving Solr instances to take 
> over serving data for victim Solr instances.
> --
>
> Key: SOLR-5656
> URL: https://issues.apache.org/jira/browse/SOLR-5656
> Project: Solr
>  Issue Type: New Feature
>Reporter: Mark Miller
>Assignee: Mark Miller
>
> When using HDFS, the Overseer should have the ability to reassign the cores 
> from failed nodes to running nodes.
> Given that the index and transaction logs are in hdfs, it's simple for 
> surviving hardware to take over serving cores for failed hardware.
> There are some tricky issues around having the Overseer handle this for you, 
> but seems a simple first pass is not too difficult.
> This will add another alternative to replicating both with hdfs and solr.
> It shouldn't be specific to hdfs, and would be an option for any shared file 
> system Solr supports.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 83822 - Still Failing!

2014-04-29 Thread builder

Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/83822/

2 tests failed.
FAILED:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce

Error Message:
did not hit exception

Stack Trace:
java.lang.AssertionError: did not hit exception
at 
__randomizedtesting.SeedInfo.seed([32BAF05DA42A99B6:AD30E0544A6B2C46]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(TestIndexWriterWithThreads.java:338)
at 
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce(TestIndexWriterWithThreads.java:449)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at java.lang.Thread.run(Thread.java:724)


FAILED:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegment

Error Message:
did not hit exception

Stack Trace:
java.lang.AssertionError: did not hit exception
at 
__randomizedtesting.SeedInfo.seed([32BAF05DA42A99B6:81CDC1C555FF0E8A]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(TestIndexW

Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 83820 - Failure!

2014-04-29 Thread Robert Muir

trying to fix the test screws up stored fields bits, so i think there
is a real exception handling bug, in addition to the stacktrace issue:

ant test  -Dtestcase=TestIndexWriterWithThreads
-Dtests.method=testImmediateDiskFullWithThreads
-Dtests.seed=7B80A10209D6E21D -Dtests.slow=true -Dtests.locale=iw
-Dtests.timezone=Europe/Copenhagen -Dtests.file.encoding=US-ASCII


@@ -426,8 +426,9 @@
 public void eval(MockDirectoryWrapper dir)  throws IOException {
   if (doFail) {
 StackTraceElement[] trace = new Exception().getStackTrace();
+new Exception().printStackTrace(System.out);
 for (int i = 0; i < trace.length; i++) {
-  if ("flush".equals(trace[i].getMethodName()) &&
"org.apache.lucene.index.DocFieldProcessor".equals(trace[i].getClassName()))
{
+  if ("flush".equals(trace[i].getMethodName()) &&
DefaultIndexingChain.class.getName().equals(trace[i].getClassName()))
{


On Tue, Apr 29, 2014 at 1:29 PM, Robert Muir  wrote:
> I'm looking at this.
>
> On Tue, Apr 29, 2014 at 1:25 PM,   wrote:
>> Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/83820/
>>
>> 2 tests failed.
>> REGRESSION:  
>> org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce
>>
>> Error Message:
>> did not hit exception
>>
>> Stack Trace:
>> java.lang.AssertionError: did not hit exception
>> at 
>> __randomizedtesting.SeedInfo.seed([D76C3910F82E390F:48E62919166F8CFF]:0)
>> at org.junit.Assert.fail(Assert.java:93)
>> at 
>> org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(TestIndexWriterWithThreads.java:338)
>> at 
>> org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce(TestIndexWriterWithThreads.java:449)
>> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> at 
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>> at 
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> at java.lang.reflect.Method.invoke(Method.java:606)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
>> at 
>> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
>> at 
>> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
>> at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>> at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> at 
>> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
>> at 
>> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
>> at 
>> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
>> at 
>> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
>> at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
>> at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
>> at 
>> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
>> at 
>> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
>> at 
>> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
>> at 
>> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
>> at 
>> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
>> at 
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>> at 
>> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
>> at 
>> com.carrotsearch.randomizedtesting.rules.Statem

[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 83821 - Still Failing!

2014-04-29 Thread builder

Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/83821/

2 tests failed.
FAILED:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce

Error Message:
did not hit exception

Stack Trace:
java.lang.AssertionError: did not hit exception
at 
__randomizedtesting.SeedInfo.seed([7ACCA54D60DD2F0:9826DA5D384C6700]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(TestIndexWriterWithThreads.java:338)
at 
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce(TestIndexWriterWithThreads.java:449)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at java.lang.Thread.run(Thread.java:724)


FAILED:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegment

Error Message:
did not hit exception

Stack Trace:
java.lang.AssertionError: did not hit exception
at 
__randomizedtesting.SeedInfo.seed([7ACCA54D60DD2F0:B4DBFBCC27D845CC]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(TestIndexWri

Re: [JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 83820 - Failure!

2014-04-29 Thread Robert Muir

I'm looking at this.

On Tue, Apr 29, 2014 at 1:25 PM,   wrote:
> Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/83820/
>
> 2 tests failed.
> REGRESSION:  
> org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce
>
> Error Message:
> did not hit exception
>
> Stack Trace:
> java.lang.AssertionError: did not hit exception
> at 
> __randomizedtesting.SeedInfo.seed([D76C3910F82E390F:48E62919166F8CFF]:0)
> at org.junit.Assert.fail(Assert.java:93)
> at 
> org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(TestIndexWriterWithThreads.java:338)
> at 
> org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce(TestIndexWriterWithThreads.java:449)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
> at 
> org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
> at 
> org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at 
> org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
> at 
> com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
> at 
> org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
> at 
> org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
> at 
> com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at 
> com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
> at 
> org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
> at 
> org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
> at 
> org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
> at 
> com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
> at 
> com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
> at java.lang.Thread.run(Thread.java:724)
>
>
> REGRESSION:  
> org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegment
>
> Error Message:
> did not hit exception
>
> Stack Trace:
> java.lang.Asserti

[JENKINS] Lucene-trunk-Linux-Java7-64-test-only - Build # 83820 - Failure!

2014-04-29 Thread builder

Build: builds.flonkings.com/job/Lucene-trunk-Linux-Java7-64-test-only/83820/

2 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce

Error Message:
did not hit exception

Stack Trace:
java.lang.AssertionError: did not hit exception
at 
__randomizedtesting.SeedInfo.seed([D76C3910F82E390F:48E62919166F8CFF]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(TestIndexWriterWithThreads.java:338)
at 
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegmentOnlyOnce(TestIndexWriterWithThreads.java:449)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$6.evaluate(RandomizedRunner.java:827)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$7.evaluate(RandomizedRunner.java:863)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$8.evaluate(RandomizedRunner.java:877)
at 
org.apache.lucene.util.TestRuleSetupTeardownChained$1.evaluate(TestRuleSetupTeardownChained.java:50)
at 
org.apache.lucene.util.TestRuleFieldCacheSanity$1.evaluate(TestRuleFieldCacheSanity.java:51)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
org.apache.lucene.util.TestRuleThreadAndTestName$1.evaluate(TestRuleThreadAndTestName.java:49)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.forkTimeoutingTask(ThreadLeakControl.java:793)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$3.evaluate(ThreadLeakControl.java:453)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner.runSingleTest(RandomizedRunner.java:836)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$3.evaluate(RandomizedRunner.java:738)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:772)
at 
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at 
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:360)
at java.lang.Thread.run(Thread.java:724)


REGRESSION:  
org.apache.lucene.index.TestIndexWriterWithThreads.testIOExceptionDuringWriteSegment

Error Message:
did not hit exception

Stack Trace:
java.lang.AssertionError: did not hit exception
at 
__randomizedtesting.SeedInfo.seed([D76C3910F82E390F:641B088809FBAE33]:0)
at org.junit.Assert.fail(Assert.java:93)
at 
org.apache.lucene.index.TestIndexWriterWithThreads._testSingleThreadFailure(Te

RE: maxThreads in Jetty

2014-04-29 Thread Toke Eskildsen

Mark Miller [markrmil...@gmail.com] wrote:
> Using separate threads pools is something that I will work on at some point.
> 
> It should be an option with Solr 4X as well.

Thank you. We have two separate setups with heavy faceting that needs limits on 
concurrent searches and it would be nice to have it as par of the Solr server 
instead of outside.

Regards,
Toke Eskildsen
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5611) Simplify the default indexing chain

2014-04-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984515#comment-13984515
 ] 

ASF subversion and git services commented on LUCENE-5611:
-

Commit 1591025 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1591025 ]

LUCENE-5611: simplify the default indexing chain

> Simplify the default indexing chain
> ---
>
> Key: LUCENE-5611
> URL: https://issues.apache.org/jira/browse/LUCENE-5611
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5611.patch, LUCENE-5611.patch
>
>
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs.  Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-29 Thread Timothy Potter (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984507#comment-13984507
 ] 

Timothy Potter commented on SOLR-5473:
--

Hi Mark,

Just wanted to get your input on the watcher approach I described above:

bq. In terms of what's watched and what is not watched, this patch includes 
code from 5474 (as they were too intimately tied together to keep separated) 
which doesn't watch collection state changes on the client side. Instead the 
client relies on a stateVer check during request processing and receives an 
error from the server if the client state is stale. I too think this is a 
little controversial / confusing and maybe we don't have to keep that as part 
of this solution. It was our mistake to merge those two into a single patch. We 
originally were thinking 5474 was needed to keep the number of watchers on a 
znode to a minimum in the event of many clients using many collections. 
However, I do think this feature can be split out and dealt with in a better 
way, if at all. In other words, split state znodes are watched from server and 
client side.

Basically, our approach is that each core on the cluster-side only watches the 
state znode for the collection it is participating in and on the client-side, 
the CloudSolrServer is not watching any state znodes and instead rely on cached 
DocCollections and stale state checks on the server side when processing client 
requests. Do you have any concerns with us moving forward with this approach? 
Alternatively, the CloudSolrServer on the client-side can use watchers on state 
znodes after dynamically fetching them from the server when a request for a new 
collection comes in.

> Make one state.json per collection
> --
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_undo.patch, 
> ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5591) ReaderAndUpdates should create a proper IOContext when writing DV updates

2014-04-29 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984473#comment-13984473
 ] 

Michael McCandless commented on LUCENE-5591:


Looks great Shai, thanks!

Is avgUpdateSize supposed to be "bytes per doc"?  If so, instead of 
bitsPerValue, shouldn't we return bitsPerValue/8, maybe rounded up to the 
nearest byte?  Should we rename the method ... maybe ramBytesPerDoc or 
something?

Shouldn't BinaryDocValuesFieldUpdates.avgUpdateSize also include the 
docs/offsets/lengths RAM used too?

Separately, I noticed BinaryDocValuesFieldUpdates's add method is doing a 
BytesRef.append of each added value ... isn't this slowish (O(N^2) where N = 
number of docs that have been updated)?  BytesRef.append doesn't use 
ArrayUtil.grow to size the array on overflow...

> ReaderAndUpdates should create a proper IOContext when writing DV updates
> -
>
> Key: LUCENE-5591
> URL: https://issues.apache.org/jira/browse/LUCENE-5591
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Shai Erera
> Attachments: LUCENE-5591.patch
>
>
> Today we pass IOContext.DEFAULT. If DV updates are used in conjunction w/ 
> NRTCachingDirectory, it means the latter will attempt to write the entire DV 
> field in its RAMDirectory, which could lead to OOM.
> Would be good if we can build our own FlushInfo, estimating the number of 
> bytes we're about to write. I didn't see off hand a quick way to guesstimate 
> that - I thought to use the current DV's sizeInBytes as an approximation, but 
> I don't see a way to get it, not a direct way at least.
> Maybe we can use the size of the in-memory updates to guesstimate that 
> amount? Something like {{sizeOfInMemUpdates * (maxDoc/numUpdatedDocs)}}? Is 
> it a too wild guess?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6032) NgramFilter dont keep token less than mingram size or greater than maxgram size

2014-04-29 Thread Ahmet Arslan (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984462#comment-13984462
 ] 

Ahmet Arslan commented on SOLR-6032:


NGramFilter does not have preserveOriginal parameter. Your field type should 
throw throw {{new IllegalArgumentException("Unknown parameters: 
preserveOriginal")}}

In your case it is recommended to create an additional field (populated via 
copyField) without NGramFilter. 

And it is always good idea to kindly ask  questions on the user-list/irc before 
opening a jira ticket.

Please see similar discussions.

https://issues.apache.org/jira/browse/SOLR-5332
https://issues.apache.org/jira/browse/SOLR-5152
https://issues.apache.org/jira/browse/LUCENE-5620

> NgramFilter dont keep token less than mingram size or greater than maxgram 
> size
> ---
>
> Key: SOLR-6032
> URL: https://issues.apache.org/jira/browse/SOLR-6032
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 4.2.1, 4.6.1
> Environment: Ubuntu12.04,4GB RAM, Quadcore Processor
>Reporter: Kuntal Ganguly
>  Labels: build, patch
>   Original Estimate: 48h
>  Remaining Estimate: 48h
>
> I have a requirement for partial and exact type.Now partial search work fine 
> for NgramFilter within mingram & maxgram size range. Now when im trying to 
> index a value less than mingram size,the tokens are not generated .Same 
> things happens when the value is greater than maxgramsize.
> I haveto created a field type as shown below:
>  positionIncrementGap="100">
> 
>   
>   
>   
>   
>   
>   
>maxGramSize="6" preserveOriginal="true"/>
>   
> 
> when i'm trying to index a value say AB (it is not indexed and not 
> searchable). Similarly if the value is GangulyKuntal (which is greater than 
> maxgram size),the search is not working.
> **Increasing maxgram size to more than the anticipated value is not good 
> design aspect.
> NgramFilter should keep the original tokens if it is less than mingram or 
> greater than maxgram. By doing this it will make it truly partial as well as 
> exact search solution.It would really be very helpful,if this changes are 
> made in the coming release. Any suggestion will be of great help?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

2014-04-29 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984459#comment-13984459
 ] 

Michael McCandless commented on LUCENE-4396:


Thanks Da, this looks neat!

Hmm, the patch didn't cleanly apply, but I was able to work through
it.  I think your dev area is not up to date with trunk?

Small code style things: can you try to add \{ .. \} around the
true/else body of if statements, even if they are only one line?
And also no whitespace around the condition.  E.g. instead of:

{noformat}
  if ( required.size() > 0 )
return new BooleanNovelScorer(this, disableCoord, minNrShouldMatch, 
required, optional, prohibited, maxCoord);
{noformat}

do this:

{noformat}
  if (required.size() > 0) {
return new BooleanNovelScorer(this, disableCoord, minNrShouldMatch, 
required, optional, prohibited, maxCoord);
  }
{noformat}

So it looks like BooleanNovelScorer is able to be a Scorer because the
linked-list of visited buckets in one window are guaranteed to be in
docID order, because we first visit the requiredConjunctionScorer's
docs in that window.

Have you tested performance when the .advance method here isn't called?
Ie, just boolean queries w/ one MUST and one or more SHOULD?  I think
the important question here is whether/in what cases the
BooleanNovelScorer approach beats BooleanScorer2 performance?

I realized LUCENE-4872 is related here, i.e. we should also sometimes
use BooleanScorer for the minShouldMatch>1 case.


> BooleanScorer should sometimes be used for MUST clauses
> ---
>
> Key: LUCENE-4396
> URL: https://issues.apache.org/jira/browse/LUCENE-4396
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael McCandless
> Attachments: LUCENE-4396.patch, LUCENE-4396.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 100 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: maxThreads in Jetty

2014-04-29 Thread Mark Miller

Using separate threads pools is something that I will work on at some point.

It should be an option with Solr 4X as well.
-- 
Mark Miller
about.me/markrmiller

On April 27, 2014 at 2:49:08 PM, Toke Eskildsen (t...@statsbiblioteket.dk) 
wrote:

 Is something like that on the drawing board for Solr 5?

[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard

2014-04-29 Thread Daniele Madama (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984423#comment-13984423
 ] 

Daniele Madama commented on SOLR-4414:
--

Great!






-- 
The box said "Requires Windows XP or better"  so I installed Linux !
-o=|=o-

Daniele Madama
http://www.danysoft.org
skype: daniele_madama


> MoreLikeThis on a shard finds no interesting terms if the document queried is 
> not in that shard
> ---
>
> Key: SOLR-4414
> URL: https://issues.apache.org/jira/browse/SOLR-4414
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis, SolrCloud
>Affects Versions: 4.1
>Reporter: Colin Bartolome
>
> Running a MoreLikeThis query in a cloud works only when the document being 
> queried exists in whatever shard serves the request. If the document is not 
> present in the shard, no "interesting terms" are found and, consequently, no 
> matches are found.
> h5. Steps to reproduce
> * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with 
> the rest of the request handlers:
> {code:xml}
> 
> {code}
> * Follow the [simplest SolrCloud 
> example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster]
>  to get two shards running.
> * Hit this URL: 
> [http://localhost:8983/solr/collection1/mlt?mlt.fl=includes&q=id:3007WFP&mlt.match.include=false&mlt.interestingTerms=list&mlt.mindf=1&mlt.mintf=1]
> * Compare that output to that of this URL: 
> [http://localhost:7574/solr/collection1/mlt?mlt.fl=includes&q=id:3007WFP&mlt.match.include=false&mlt.interestingTerms=list&mlt.mindf=1&mlt.mintf=1]
> The former URL will return a result and list some interesting terms. The 
> latter URL will return no results and list no interesting terms. It will also 
> show this odd XML element:
> {code:xml}
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6022) Rename getAnalyzer to getIndexAnalyzer

2014-04-29 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-6022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984352#comment-13984352
 ] 

Tomás Fernández Löbbe commented on SOLR-6022:
-

bq. The patch for 4.x has to be applied after merge from trunk
Got it, thanks for explaining

> Rename getAnalyzer to getIndexAnalyzer
> --
>
> Key: SOLR-6022
> URL: https://issues.apache.org/jira/browse/SOLR-6022
> Project: Solr
>  Issue Type: Improvement
>Reporter: Ryan Ernst
> Attachments: SOLR-6022.branch_4x-deprecation.patch, SOLR-6022.patch, 
> SOLR-6022.patch
>
>
> We have separate index/query analyzer chains, but the access methods for the 
> analyzers do not match up with the names.  This can lead to unknowingly using 
> the wrong analyzer chain (as it did in SOLR-6017).  We should do this 
> renaming in trunk, and deprecate the old getAnalyzer function in 4x.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6033) LeftOuter Join capabilty in SOLR and dynamic field merge in response

2014-04-29 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984342#comment-13984342
 ] 

Mikhail Khludnev edited comment on SOLR-6033 at 4/29/14 2:28 PM:
-

it's a question to the mailing list, please close the issue.
you can "add" docs into result by specifying should clause

eg if you have 

q=\{!join from=p_id to=id\}content:foo

you can do "left outer" like join by adding optional clause 

q=type:entity1 \{!join from=p_id to=id\}content:foo





was (Author: mkhludnev):
it's a question to the mailing list, please close the issue.
you can "add" docs into result by specifying should clause

eg if you have q={!join from=p_id to=id}content:foo
you can do "left outer" like join by adding optional clause 
q=type:entity1 {!join from=p_id to=id}content:foo




> LeftOuter Join capabilty in SOLR and dynamic field merge in response
> 
>
> Key: SOLR-6033
> URL: https://issues.apache.org/jira/browse/SOLR-6033
> Project: Solr
>  Issue Type: New Feature
>  Components: documentation, search
>Affects Versions: 4.2.1, 4.3, 4.5.1, 4.6.1
> Environment: RedHat Linux, 6GB Ram, Core2Duo Processor
>Reporter: Kuntal Ganguly
>  Labels: build, feature, patch
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I'm having different kind of entity in the index.
> Entitity-1: id, doc_name, type, pinprojectid, documentid, content
> Entity-2: id, proj_name,projtype,type,pinprojectid
> where type is unique for every different entity e.g. Entity-1(type=Documents) 
> & Entity-2(type=Projects).pinprojectid is common between two Entity.
> Now im trying to search on type:Document AND content:"hello",
> but the result do left outer join with Entity-2 based on join field say ( 
> pinprojectid) and fetches few information like(projtype,proj_name) and 
> display in the Entity-1 response.
> Say Entity-1 search gives 12 result,but left-outer join field fetch matches 
> with 10 result.
> So the final output should be 12 with 10 doc containing extra merge fields 
> through leftouter join.
> This is very common in SQL.
> One way to do this to process from client side with two separate call to SOLR 
> server.But this functaility or enhancement or added feature needs to there in 
> solr release in a generalized way.
> Let me know if there is any other way to achieve the above scenario from 
> server side of Solr in one call??



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6033) LeftOuter Join capabilty in SOLR and dynamic field merge in response

2014-04-29 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984342#comment-13984342
 ] 

Mikhail Khludnev commented on SOLR-6033:


it's a question to the mailing list, please close the issue.
you can "add" docs into result by specifying should clause

eg if you have q={!join from=p_id to=id}content:foo
you can do "left outer" like join by adding optional clause 
q=type:entity1 {!join from=p_id to=id}content:foo




> LeftOuter Join capabilty in SOLR and dynamic field merge in response
> 
>
> Key: SOLR-6033
> URL: https://issues.apache.org/jira/browse/SOLR-6033
> Project: Solr
>  Issue Type: New Feature
>  Components: documentation, search
>Affects Versions: 4.2.1, 4.3, 4.5.1, 4.6.1
> Environment: RedHat Linux, 6GB Ram, Core2Duo Processor
>Reporter: Kuntal Ganguly
>  Labels: build, feature, patch
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> I'm having different kind of entity in the index.
> Entitity-1: id, doc_name, type, pinprojectid, documentid, content
> Entity-2: id, proj_name,projtype,type,pinprojectid
> where type is unique for every different entity e.g. Entity-1(type=Documents) 
> & Entity-2(type=Projects).pinprojectid is common between two Entity.
> Now im trying to search on type:Document AND content:"hello",
> but the result do left outer join with Entity-2 based on join field say ( 
> pinprojectid) and fetches few information like(projtype,proj_name) and 
> display in the Entity-1 response.
> Say Entity-1 search gives 12 result,but left-outer join field fetch matches 
> with 10 result.
> So the final output should be 12 with 10 doc containing extra merge fields 
> through leftouter join.
> This is very common in SQL.
> One way to do this to process from client side with two separate call to SOLR 
> server.But this functaility or enhancement or added feature needs to there in 
> solr release in a generalized way.
> Let me know if there is any other way to achieve the above scenario from 
> server side of Solr in one call??



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5611) Simplify the default indexing chain

2014-04-29 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984312#comment-13984312
 ] 

Michael McCandless commented on LUCENE-5611:


Thanks Rob, I'll merge back to trunk and commit, and work on backporting to 4.x 
... will take time because there is no separate StoredDocument there.

> Simplify the default indexing chain
> ---
>
> Key: LUCENE-5611
> URL: https://issues.apache.org/jira/browse/LUCENE-5611
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5611.patch, LUCENE-5611.patch
>
>
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs.  Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5474) Have a new mode for SolrJ to support stateFormat=2

2014-04-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984309#comment-13984309
 ] 

ASF subversion and git services commented on SOLR-5474:
---

Commit 1590983 from [~noble.paul] in branch 'dev/branches/solr5473'
[ https://svn.apache.org/r1590983 ]

Creating a branch to sort out SOLR-5473 , SOLR-5474

> Have a new mode for SolrJ to support stateFormat=2
> --
>
> Key: SOLR-5474
> URL: https://issues.apache.org/jira/browse/SOLR-5474
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5474.patch, SOLR-5474.patch, SOLR-5474.patch, 
> fail.logs
>
>
> In this mode SolrJ would not watch any ZK node
> It fetches the state  on demand and cache the most recently used n 
> collections in memory.
> SolrJ would not listen to any ZK node. When a request comes for a collection 
> ‘xcoll’
> it would first check if such a collection exists
> If yes it first looks up the details in the local cache for that collection
> If not found in cache , it fetches the node /collections/xcoll/state.json and 
> caches the information
> Any query/update will be sent with extra query param specifying the 
> collection name , version (example \_stateVer=xcoll:34) . A node would throw 
> an error (INVALID_NODE) if it does not have the right version
> If SolrJ gets INVALID_NODE error it would invalidate the cache and fetch 
> fresh state information for that collection (and caches it again)
> If there is a connection timeout, SolrJ assumes the node is down and re-fetch 
> the state for the collection and try again



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984308#comment-13984308
 ] 

ASF subversion and git services commented on SOLR-5473:
---

Commit 1590983 from [~noble.paul] in branch 'dev/branches/solr5473'
[ https://svn.apache.org/r1590983 ]

Creating a branch to sort out SOLR-5473 , SOLR-5474

> Make one state.json per collection
> --
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_undo.patch, 
> ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5633) NoMergePolicy should have one singleton - NoMergePolicy.INSTANCE

2014-04-29 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984302#comment-13984302
 ] 

Shai Erera commented on LUCENE-5633:


To repeat here what I wrote on the thread, if we change to 
NoMergePolicy.INSTANCE, we should fix its useCompoundFile to return 
newSegment.info.isCompound.

> NoMergePolicy should have one singleton - NoMergePolicy.INSTANCE
> 
>
> Key: LUCENE-5633
> URL: https://issues.apache.org/jira/browse/LUCENE-5633
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Varun Thacker
>Priority: Minor
>
> Currently there are two singletons available - MergePolicy.NO_COMPOUND_FILES 
> and MergePolicy.COMPOUND_FILES and it's confusing to distinguish on compound 
> files when the merge policy never merges segments. 
> We should have one singleton - NoMergePolicy.INSTANCE
> Post to the relevant discussion - 
> http://mail-archives.apache.org/mod_mbox/lucene-java-user/201404.mbox/%3CCAOdYfZXXyVSf9%2BxYaRhr5v2O4Mc6S2v-qWuT112_CJFYhWTPqw%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: VOTE: RC1 Release apache-solr-ref-guide-4.8.pdf

2014-04-29 Thread Steve Rowe

This vote has passed.  Thanks everybody for voting.

Hoss asked me offline to finish the release - I’ll start doing that now.

Steve

On Apr 29, 2014, at 9:19 AM, Erik Hatcher  wrote:

> +1, apologies for delay (e-mail overfloweth)
> 
>   Erik
> 
> On Apr 25, 2014, at 5:38 PM, Chris Hostetter  wrote:
> 
>> 
>> (Note: cross posted to general, please confine replies to dev@lucene)
>> 
>> Please VOTE to release the following RC1 as apache-solr-ref-guide-4.8.pdf ...
>> 
>> https://dist.apache.org/repos/dist/dev/lucene/solr/ref-guide/apache-solr-ref-guide-4.8-RC1
>> 
>> 
>> The notes I previously mentioned regarding RC0 apply to this RC as well...
>> 
>> 1) Due to a known bug in confluence, the PDFs it generates are much bigger 
>> then they should be.  This bug has been fixed in the latest version of 
>> confluence, but cwiki.apache.rog has not yet been updated.  For that reason, 
>> I have manually run a small tool against the PDF to "fix" the size (see 
>> SOLR-5819).  The first time i tried this approach, it inadvertantly removed 
>> the "Index" (aka: Table of Contents, or Bookmarks depending on what PDF 
>> reader client you use).  I've already fixed this, but if you notice anything 
>> else unusual about this PDF compared to previous versions please speak up so 
>> we can see if it's a result of this post-processing and try to fix it.
>> 
>> 2) This is the first ref guide release where we've started using a special 
>> confluence macro for any lucene/solr javadoc links.  The up side is that all 
>> javadoc links in this 4.8 ref guide will now correctly point to the 4.8 
>> javadocs on lucene.apache.org -- the down side is that this means none of 
>> those links currently work, since the 4.8 code release is still ongoing and 
>> the website has not yet been updated.
>> 
>> Because of #2, I intend to leave this ref guide vote open until the 4.8 code 
>> release is final - that way we won't officially be releasing this doc until 
>> the 4.8 javadocs are uploaded and all the links work properly.
>> 
>> 
>> 
>> -Hoss
>> http://www.lucidworks.com/
> 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5611) Simplify the default indexing chain

2014-04-29 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984299#comment-13984299
 ] 

Robert Muir commented on LUCENE-5611:
-

I think another followup for this issue should be to do something about all the 
conflicting term vector option possibilities. Maybe it should have something 
more like IndexOptions. Just something to think about.

Anyway I did benchmarking and reviewing, +1 to commit the change. its way 
simpler and easier to work with.

> Simplify the default indexing chain
> ---
>
> Key: LUCENE-5611
> URL: https://issues.apache.org/jira/browse/LUCENE-5611
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Assignee: Michael McCandless
> Fix For: 4.9, 5.0
>
> Attachments: LUCENE-5611.patch, LUCENE-5611.patch
>
>
> I think Lucene's current indexing chain has too many classes /
> hierarchy / abstractions, making it look much more complex than it
> really should be, and discouraging users from experimenting/innovating
> with their own indexing chains.
> Also, if it were easier to understand/approach, then new developers
> would more likely try to improve it ... it really should be simpler.
> So I'm exploring a pared back indexing chain, and have a starting patch
> that I think is looking ok: it seems more approachable than the
> current indexing chain, or at least has fewer strange classes.
> I also thought this could give some speedup for tiny documents (a more
> common use of Lucene lately), and it looks like, with the evil
> optimizations, this is a ~25% speedup for Geonames docs.  Even without
> those evil optos it's a bit faster.
> This is very much a work in progress / nocommits, and there are some
> behavior changes e.g. the new chain requires all fields to have the
> same TV options (rather than auto-upgrading all fields by the same
> name that the current chain does)...



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6033) LeftOuter Join capabilty in SOLR and dynamic field merge in response

2014-04-29 Thread Kuntal Ganguly (JIRA)

Kuntal Ganguly created SOLR-6033:


 Summary: LeftOuter Join capabilty in SOLR and dynamic field merge 
in response
 Key: SOLR-6033
 URL: https://issues.apache.org/jira/browse/SOLR-6033
 Project: Solr
  Issue Type: New Feature
  Components: documentation, search
Affects Versions: 4.6.1, 4.5.1, 4.3, 4.2.1
 Environment: RedHat Linux, 6GB Ram, Core2Duo Processor
Reporter: Kuntal Ganguly


I'm having different kind of entity in the index.
Entitity-1: id, doc_name, type, pinprojectid, documentid, content

Entity-2: id, proj_name,projtype,type,pinprojectid

where type is unique for every different entity e.g. Entity-1(type=Documents) & 
Entity-2(type=Projects).pinprojectid is common between two Entity.
Now im trying to search on type:Document AND content:"hello",
but the result do left outer join with Entity-2 based on join field say ( 
pinprojectid) and fetches few information like(projtype,proj_name) and display 
in the Entity-1 response.

Say Entity-1 search gives 12 result,but left-outer join field fetch matches 
with 10 result.
So the final output should be 12 with 10 doc containing extra merge fields 
through leftouter join.
This is very common in SQL.

One way to do this to process from client side with two separate call to SOLR 
server.But this functaility or enhancement or added feature needs to there in 
solr release in a generalized way.

Let me know if there is any other way to achieve the above scenario from server 
side of Solr in one call??





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-5633) NoMergePolicy should have one singleton - NoMergePolicy.INSTANCE

2014-04-29 Thread Varun Thacker (JIRA)

Varun Thacker created LUCENE-5633:
-

 Summary: NoMergePolicy should have one singleton - 
NoMergePolicy.INSTANCE
 Key: LUCENE-5633
 URL: https://issues.apache.org/jira/browse/LUCENE-5633
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Varun Thacker
Priority: Minor


Currently there are two singletons available - MergePolicy.NO_COMPOUND_FILES 
and MergePolicy.COMPOUND_FILES and it's confusing to distinguish on compound 
files when the merge policy never merges segments. 

We should have one singleton - NoMergePolicy.INSTANCE

Post to the relevant discussion - 
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201404.mbox/%3CCAOdYfZXXyVSf9%2BxYaRhr5v2O4Mc6S2v-qWuT112_CJFYhWTPqw%40mail.gmail.com%3E



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5473) Make one state.json per collection

2014-04-29 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984287#comment-13984287
 ] 

Steve Rowe commented on SOLR-5473:
--

bq. Where r we going to create the new branch? SVN ? git?

Lucene/Solr is developed on SVN.  Make a new branch named SOLR-5473 (or 
solr5473) under https://svn.apache.org/repos/asf/lucene/dev/branches/

> Make one state.json per collection
> --
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_undo.patch, 
> ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6032) NgramFilter dont keep token less than mingram size or greater than maxgram size

2014-04-29 Thread Kuntal Ganguly (JIRA)

Kuntal Ganguly created SOLR-6032:


 Summary: NgramFilter dont keep token less than mingram size or 
greater than maxgram size
 Key: SOLR-6032
 URL: https://issues.apache.org/jira/browse/SOLR-6032
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 4.6.1, 4.2.1
 Environment: Ubuntu12.04,4GB RAM, Quadcore Processor
Reporter: Kuntal Ganguly


I have a requirement for partial and exact type.Now partial search work fine 
for NgramFilter within mingram & maxgram size range. Now when im trying to 
index a value less than mingram size,the tokens are not generated .Same things 
happens when the value is greater than maxgramsize.


I haveto created a field type as shown below:













when i'm trying to index a value say AB (it is not indexed and not searchable). 
Similarly if the value is GangulyKuntal (which is greater than maxgram 
size),the search is not working.

**Increasing maxgram size to more than the anticipated value is not good design 
aspect.

NgramFilter should keep the original tokens if it is less than mingram or 
greater than maxgram. By doing this it will make it truly partial as well as 
exact search solution.It would really be very helpful,if this changes are made 
in the coming release. Any suggestion will be of great help?





--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5473) Make one state.json per collection

2014-04-29 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5473:
-

Attachment: SOLR-5473_undo.patch

patch Undoing the commit

I'm going to commit this straight away if there is no further comment

Where r we going to create the new branch? SVN ? git?

> Make one state.json per collection
> --
>
> Key: SOLR-5473
> URL: https://issues.apache.org/jira/browse/SOLR-5473
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Reporter: Noble Paul
>Assignee: Noble Paul
> Fix For: 5.0
>
> Attachments: SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473-74.patch, SOLR-5473-74.patch, 
> SOLR-5473-74.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, 
> SOLR-5473.patch, SOLR-5473.patch, SOLR-5473.patch, SOLR-5473_undo.patch, 
> ec2-23-20-119-52_solr.log, ec2-50-16-38-73_solr.log
>
>
> As defined in the parent issue, store the states of each collection under 
> /collections/collectionname/state.json node



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard

2014-04-29 Thread Simone Gianni (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984261#comment-13984261
 ] 

Simone Gianni commented on SOLR-4414:
-

Managed to work around this bug using the TermVectorComponent (which is 
properly sharded) to fetch term vectors and adapting the query generation code 
in Lucene's MoreLikeThis component to create the MLT query client side. It's 
two calls (one for the term vectors and then one to perform the MLT query), but 
it works and is totally sharded.

Just a hint on how to eventually work it around. 

> MoreLikeThis on a shard finds no interesting terms if the document queried is 
> not in that shard
> ---
>
> Key: SOLR-4414
> URL: https://issues.apache.org/jira/browse/SOLR-4414
> Project: Solr
>  Issue Type: Bug
>  Components: MoreLikeThis, SolrCloud
>Affects Versions: 4.1
>Reporter: Colin Bartolome
>
> Running a MoreLikeThis query in a cloud works only when the document being 
> queried exists in whatever shard serves the request. If the document is not 
> present in the shard, no "interesting terms" are found and, consequently, no 
> matches are found.
> h5. Steps to reproduce
> * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with 
> the rest of the request handlers:
> {code:xml}
> 
> {code}
> * Follow the [simplest SolrCloud 
> example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster]
>  to get two shards running.
> * Hit this URL: 
> [http://localhost:8983/solr/collection1/mlt?mlt.fl=includes&q=id:3007WFP&mlt.match.include=false&mlt.interestingTerms=list&mlt.mindf=1&mlt.mintf=1]
> * Compare that output to that of this URL: 
> [http://localhost:7574/solr/collection1/mlt?mlt.fl=includes&q=id:3007WFP&mlt.match.include=false&mlt.interestingTerms=list&mlt.mindf=1&mlt.mintf=1]
> The former URL will return a result and list some interesting terms. The 
> latter URL will return no results and list no interesting terms. It will also 
> show this odd XML element:
> {code:xml}
> 
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6029) CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment

2014-04-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984250#comment-13984250
 ] 

ASF subversion and git services commented on SOLR-6029:
---

Commit 1590965 from [~joel.bernstein] in branch 'dev/trunk'
[ https://svn.apache.org/r1590965 ]

SOLR-6029: Updated CHANGES.txt

> CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc 
> has been deleted from a segment
> -
>
> Key: SOLR-6029
> URL: https://issues.apache.org/jira/browse/SOLR-6029
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7.1
>Reporter: Greg Harris
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.8.1, 4.9
>
> Attachments: SOLR-6029.patch
>
>
> CollapsingQParserPlugin misidentifies if a document is not found in a segment 
> if the docid previously existed in a segment ie was deleted. 
> Relevant code bit from CollapsingQParserPlugin needs to be changed from:
> -if(doc != -1) {
> +if((doc != -1) && (doc != DocsEnum.NO_MORE_DOCS)) {
> What happens is if the doc is not found the returned value is 
> DocsEnum.NO_MORE_DOCS. This would then get set in the fq bitSet array as the 
> doc location causing an ArrayIndexOutOfBoundsException as the array is only 
> as big as maxDocs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6029) CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc has been deleted from a segment

2014-04-29 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984253#comment-13984253
 ] 

ASF subversion and git services commented on SOLR-6029:
---

Commit 1590966 from [~joel.bernstein] in branch 'dev/branches/branch_4x'
[ https://svn.apache.org/r1590966 ]

SOLR-6029: Updated CHANGES.txt

> CollapsingQParserPlugin throws ArrayIndexOutOfBoundsException if elevated doc 
> has been deleted from a segment
> -
>
> Key: SOLR-6029
> URL: https://issues.apache.org/jira/browse/SOLR-6029
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 4.7.1
>Reporter: Greg Harris
>Assignee: Joel Bernstein
>Priority: Minor
> Fix For: 4.8.1, 4.9
>
> Attachments: SOLR-6029.patch
>
>
> CollapsingQParserPlugin misidentifies if a document is not found in a segment 
> if the docid previously existed in a segment ie was deleted. 
> Relevant code bit from CollapsingQParserPlugin needs to be changed from:
> -if(doc != -1) {
> +if((doc != -1) && (doc != DocsEnum.NO_MORE_DOCS)) {
> What happens is if the doc is not found the returned value is 
> DocsEnum.NO_MORE_DOCS. This would then get set in the fq bitSet array as the 
> doc location causing an ArrayIndexOutOfBoundsException as the array is only 
> as big as maxDocs. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-04-29 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984241#comment-13984241
 ] 

Noble Paul edited comment on SOLR-5681 at 4/29/14 12:29 PM:


FIxed OCPTest error

peekTopN() still does not take care of the problem I reported earlier


was (Author: noble.paul):
FIxed OCPTest error

peekN() still does not take care of the problem I reported earlier

> Make the OverseerCollectionProcessor multi-threaded
> ---
>
> Key: SOLR-5681
> URL: https://issues.apache.org/jira/browse/SOLR-5681
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
> Attachments: SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, 
> SOLR-5681.patch
>
>
> Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
> anything long running would have it block processing of other mutually 
> exclusive tasks.
> When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
> truly non-blocking behavior by multi-threading the OCP itself.
> For example, a ShardSplit call on Collection1 would block the thread and 
> thereby, not processing a create collection task (which would stay queued in 
> zk) though both the tasks are mutually exclusive.
> Here are a few of the challenges:
> * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
> easy way to handle that is to only let 1 task per collection run at a time.
> * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
> The task from the workQueue is only removed on completion so that in case of 
> a failure, the new Overseer can re-consume the same task and retry. A queue 
> is not the right data structure in the first place to look ahead i.e. get the 
> 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
> which are not at the head of a queue is not really an 'intuitive' thing.
> Proposed solutions for task management:
> * Task funnel and peekAfter(): The parent thread is responsible for getting 
> and passing the request to a new thread (or one from the pool). The parent 
> method uses a peekAfter(last element) instead of a peek(). The peekAfter 
> returns the task after the 'last element'. Maintain this request information 
> and use it for deleting/cleaning up the workQueue.
> * Another (almost duplicate) queue: While offering tasks to workQueue, also 
> offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
> soon as a task from this is picked up for processing by the thread, it's 
> removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5681) Make the OverseerCollectionProcessor multi-threaded

2014-04-29 Thread Noble Paul (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-5681:
-

Attachment: SOLR-5681.patch

FIxed OCPTest error

peekN() still does not take care of the problem I reported earlier

> Make the OverseerCollectionProcessor multi-threaded
> ---
>
> Key: SOLR-5681
> URL: https://issues.apache.org/jira/browse/SOLR-5681
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: Anshum Gupta
>Assignee: Anshum Gupta
> Attachments: SOLR-5681.patch, SOLR-5681.patch, SOLR-5681.patch, 
> SOLR-5681.patch
>
>
> Right now, the OverseerCollectionProcessor is single threaded i.e submitting 
> anything long running would have it block processing of other mutually 
> exclusive tasks.
> When OCP tasks become optionally async (SOLR-5477), it'd be good to have 
> truly non-blocking behavior by multi-threading the OCP itself.
> For example, a ShardSplit call on Collection1 would block the thread and 
> thereby, not processing a create collection task (which would stay queued in 
> zk) though both the tasks are mutually exclusive.
> Here are a few of the challenges:
> * Mutual exclusivity: Only let mutually exclusive tasks run in parallel. An 
> easy way to handle that is to only let 1 task per collection run at a time.
> * ZK Distributed Queue to feed tasks: The OCP consumes tasks from a queue. 
> The task from the workQueue is only removed on completion so that in case of 
> a failure, the new Overseer can re-consume the same task and retry. A queue 
> is not the right data structure in the first place to look ahead i.e. get the 
> 2nd task from the queue when the 1st one is in process. Also, deleting tasks 
> which are not at the head of a queue is not really an 'intuitive' thing.
> Proposed solutions for task management:
> * Task funnel and peekAfter(): The parent thread is responsible for getting 
> and passing the request to a new thread (or one from the pool). The parent 
> method uses a peekAfter(last element) instead of a peek(). The peekAfter 
> returns the task after the 'last element'. Maintain this request information 
> and use it for deleting/cleaning up the workQueue.
> * Another (almost duplicate) queue: While offering tasks to workQueue, also 
> offer them to a new queue (call it volatileWorkQueue?). The difference is, as 
> soon as a task from this is picked up for processing by the thread, it's 
> removed from the queue. At the end, the cleanup is done from the workQueue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5591) ReaderAndUpdates should create a proper IOContext when writing DV updates

2014-04-29 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984234#comment-13984234
 ] 

Shai Erera commented on LUCENE-5591:


BTW, I started by adding {{ramBytesUsed()}} to {{DocValuesFieldUpdates}}, but 
that was way over estimated, especially when the number of updates is small. 
That's due to the buffers used by these classes, e.g. GrowableWriter with 
pageSize=1024. I don't think that the RAM representation should be used as an 
estimate, rather the avg-update size is closer to what will eventually be 
written to disk.

> ReaderAndUpdates should create a proper IOContext when writing DV updates
> -
>
> Key: LUCENE-5591
> URL: https://issues.apache.org/jira/browse/LUCENE-5591
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Shai Erera
> Attachments: LUCENE-5591.patch
>
>
> Today we pass IOContext.DEFAULT. If DV updates are used in conjunction w/ 
> NRTCachingDirectory, it means the latter will attempt to write the entire DV 
> field in its RAMDirectory, which could lead to OOM.
> Would be good if we can build our own FlushInfo, estimating the number of 
> bytes we're about to write. I didn't see off hand a quick way to guesstimate 
> that - I thought to use the current DV's sizeInBytes as an approximation, but 
> I don't see a way to get it, not a direct way at least.
> Maybe we can use the size of the in-memory updates to guesstimate that 
> amount? Something like {{sizeOfInMemUpdates * (maxDoc/numUpdatedDocs)}}? Is 
> it a too wild guess?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5591) ReaderAndUpdates should create a proper IOContext when writing DV updates

2014-04-29 Thread Shai Erera (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-5591:
---

Attachment: LUCENE-5591.patch

Patch adds an approximation flush size:

* DocValuesFieldUpdates.avgUpdateSize, implemented by both Numeric and Binary.
* ReaderAndUpdates creates a FlushInfo w/ the total avgUpdateSize (over all 
fields) x maxDoc as an approximation. I also approximated the size of the FIS.
* Added two tests to TestNumeric/BinaryDVUpdates using NRTCachingDirectory, 
making sure that we don't pass IOContext.DEFAULT (ensuring there are no cached 
files after applying an update).

I think it's ready.

> ReaderAndUpdates should create a proper IOContext when writing DV updates
> -
>
> Key: LUCENE-5591
> URL: https://issues.apache.org/jira/browse/LUCENE-5591
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Shai Erera
> Attachments: LUCENE-5591.patch
>
>
> Today we pass IOContext.DEFAULT. If DV updates are used in conjunction w/ 
> NRTCachingDirectory, it means the latter will attempt to write the entire DV 
> field in its RAMDirectory, which could lead to OOM.
> Would be good if we can build our own FlushInfo, estimating the number of 
> bytes we're about to write. I didn't see off hand a quick way to guesstimate 
> that - I thought to use the current DV's sizeInBytes as an approximation, but 
> I don't see a way to get it, not a direct way at least.
> Maybe we can use the size of the in-memory updates to guesstimate that 
> amount? Something like {{sizeOfInMemUpdates * (maxDoc/numUpdatedDocs)}}? Is 
> it a too wild guess?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers

2014-04-29 Thread Shai Erera (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984218#comment-13984218
 ] 

Shai Erera commented on LUCENE-5618:


bq. I dont like this pushing back against completely valid checks.

I don't push back, I'm trying to have a discussion. Why do you assume that 
questions indicate push back???

Do you also think that it's OK for a Codec to receive fields it never handled? 
If not, we should check that too. That to me indicates a bigger problem than 
sending a subset of fields.

I will look into adding another gen to SCI. But if all that we want to achieve 
is "That the field numbers it is using are valid!", there's another way to do 
that -- we can pass to a Codec a FieldsValidator or something for this purpose. 
That way we don't need to pass all FIs to a Codec and don't run into the 
PerFieldDVF issue I mentioned above, and don't complicate SCI with another gen. 
Just mentioning there are other ways to achieve consistency checks...

> DocValues updates send wrong fieldinfos to codec producers
> --
>
> Key: LUCENE-5618
> URL: https://issues.apache.org/jira/browse/LUCENE-5618
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 4.9
>
>
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't 
> contain the correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write 
> "batches" of fields in updates but just have only one field per gen? 
> This removes many-many relationships and would make things easy to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers

2014-04-29 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5618:


Fix Version/s: 4.9

> DocValues updates send wrong fieldinfos to codec producers
> --
>
> Key: LUCENE-5618
> URL: https://issues.apache.org/jira/browse/LUCENE-5618
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.9
>
>
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't 
> contain the correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write 
> "batches" of fields in updates but just have only one field per gen? 
> This removes many-many relationships and would make things easy to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers

2014-04-29 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5618:


Priority: Blocker  (was: Major)

> DocValues updates send wrong fieldinfos to codec producers
> --
>
> Key: LUCENE-5618
> URL: https://issues.apache.org/jira/browse/LUCENE-5618
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Blocker
> Fix For: 4.9
>
>
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't 
> contain the correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write 
> "batches" of fields in updates but just have only one field per gen? 
> This removes many-many relationships and would make things easy to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5618) DocValues updates send wrong fieldinfos to codec producers

2014-04-29 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5618?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13984205#comment-13984205
 ] 

Robert Muir commented on LUCENE-5618:
-

{quote}
What sort of index corruption does this check detect? As I see it, the Codec 
gets a subset of the fields that it already wrote.
{quote}

That the field numbers it is using are valid!

Please, stop pushing back on this. I will make this a blocker issue for 4.9. 
Maybe we should disable the dv update tests and enable this check in the 
meantime? This is the most fair.

I dont like this pushing back against completely valid checks.

> DocValues updates send wrong fieldinfos to codec producers
> --
>
> Key: LUCENE-5618
> URL: https://issues.apache.org/jira/browse/LUCENE-5618
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
> Fix For: 4.9
>
>
> Spinoff from LUCENE-5616.
> See the example there, docvalues readers get a fieldinfos, but it doesn't 
> contain the correct ones, so they have invalid field numbers at read time.
> This should really be fixed. Maybe a simple solution is to not write 
> "batches" of fields in updates but just have only one field per gen? 
> This removes many-many relationships and would make things easy to understand.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6031) Getting Cannot find symbol while Compiling the java file.

2014-04-29 Thread Anshum Gupta (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta resolved SOLR-6031.


Resolution: Invalid

Kindly ask the 'How do I..' questions on the user-list/irc.

> Getting Cannot find symbol while Compiling the java file.
> -
>
> Key: SOLR-6031
> URL: https://issues.apache.org/jira/browse/SOLR-6031
> Project: Solr
>  Issue Type: Task
>  Components: clients - java
> Environment: Centos6.5, Solr-4.7.1, java version "1.7.0_51"
>Reporter: Vikash Kumar Singh
>Priority: Minor
>  Labels: newbie, test
>
> Here is the code which i am using just for testing purpose first on console 
> as 
> import org.apache.solr.client.solrj.SolrServerException;
> import org.apache.solr.client.solrj.impl.HttpSolrServer;
> import org.apache.solr.client.solrj.SolrQuery;
> import org.apache.solr.client.solrj.response.QueryResponse;
> import org.apache.solr.common.SolrDocumentList;
> import java.net.MalformedURLException;
> public class SolrJSearcher
> {
>  public static void main(String[] args) throws 
> MalformedURLException,SolrServerException
>  {
> HttpSolrServer solr = new 
> HttpSolrServer("http://localhost:8983/solr";);
> SolrQuery query = new SolrQuery();
> query.setQuery("sony digital camera");
> query.addFilterQuery("cat:electronics","store:amazon.com");
> query.setFields("id","price","merchant","cat","store");
> query.setStart(0);
> query.set("defType", "edismax");
> QueryResponse response = solr.query(query);
> SolrDocumentList results = response.getResults();
> for (int i = 0; i < results.size(); ++i)
>  {
>   System.out.println(results.get(i));
>  }
>  }
> }
> Also i hv set the classpath as 
> export 
> CLASSPATH=/home/vikash/solr-4.7.1/dist/*.jar:/home/vikash/solr-4.7.1/dist/solrj-lib/*.jar
> but still while compiling i get these errors, i don't know what to do now, 
> please help
> [root@localhost vikash]# javac SolrJSearcher.java 
> SolrJSearcher.java:1: package org.apache.solr.client.solrj does not exist
> import org.apache.solr.client.solrj.SolrServerException;
>^
> SolrJSearcher.java:2: package org.apache.solr.client.solrj.impl does not exist
> import org.apache.solr.client.solrj.impl.HttpSolrServer;
> ^
> SolrJSearcher.java:3: package org.apache.solr.client.solrj does not exist
> import org.apache.solr.client.solrj.SolrQuery;
>^
> SolrJSearcher.java:4: package org.apache.solr.client.solrj.response does not 
> exist
> import org.apache.solr.client.solrj.response.QueryResponse;
> ^
> SolrJSearcher.java:5: package org.apache.solr.common does not exist
> import org.apache.solr.common.SolrDocumentList;
>  ^
> SolrJSearcher.java:10: cannot find symbol
> symbol  : class SolrServerException
> location: class SolrJSearcher
>  public static void main(String[] args) throws 
> MalformedURLException,SolrServerException
>  ^
> SolrJSearcher.java:12: cannot find symbol
> symbol  : class HttpSolrServer
> location: class SolrJSearcher
>   HttpSolrServer solr = new HttpSolrServer("http://localhost:8983/solr";);
>   ^
> SolrJSearcher.java:12: cannot find symbol
> symbol  : class HttpSolrServer
> location: class SolrJSearcher
>   HttpSolrServer solr = new HttpSolrServer("http://localhost:8983/solr";);
> ^
> SolrJSearcher.java:13: cannot find symbol
> symbol  : class SolrQuery
> location: class SolrJSearcher
>   SolrQuery query = new SolrQuery();
>   ^
> SolrJSearcher.java:13: cannot find symbol
> symbol  : class SolrQuery
> location: class SolrJSearcher
>   SolrQuery query = new SolrQuery();
> ^
> SolrJSearcher.java:19: cannot find symbol
> symbol  : class QueryResponse
> location: class SolrJSearcher
>   QueryResponse response = solr.query(query);
>   ^
> SolrJSearcher.java:20: cannot find symbol
> symbol  : class SolrDocumentList
> location: class SolrJSearcher
>   SolrDocumentList results = response.getResults();
>   ^
> 12 errors
> [root@localhost vikash]# 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

55 matches

Mail list logo