date:20141217


[ 
https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246618#comment-14246618
 ] 

Forest Soup edited comment on SOLR-6359 at 12/17/14 7:59 AM:
-

The numRecordsToKeep and maxNumLogsToKeep values should be in the 
updateLog., like below. Right?
!-- Enables a transaction log, used for real-time get, durability, and
 and solr cloud replica recovery.  The log can grow as big as
 uncommitted changes to the index, so use of a hard autoCommit
 is recommended (see below).
 dir - the target directory for transaction logs, defaults to the
solr data directory.  --
updateLog
  str name=dir${solr.ulog.dir:}/str
  int name=numRecordsToKeep1/int
  int name=maxNumLogsToKeep100/int
/updateLog


was (Author: forest_soup):
And where should I set the numRecordsToKeep and maxNumLogsToKeep values? 
Thanks!

 Allow customization of the number of records and logs kept by UpdateLog
 ---

 Key: SOLR-6359
 URL: https://issues.apache.org/jira/browse/SOLR-6359
 Project: Solr
  Issue Type: Improvement
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, Trunk


 Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, 
 and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the 
 records) in an heavily indexing setup, leading to full recovery even if Solr 
 was just stopped and restarted.
 These values should be customizable (even if only present as expert options).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6816) Review SolrCloud Indexing Performance.

2014-12-17 Thread Per Steffensen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249639#comment-14249639
]

Per Steffensen commented on SOLR-6816:
--

I believe today overwrite=false will not prevent neither document-version-check
on leader (it will in the Solr we use in my company, but not in Apache Solr)
nor bucket-version-check on non-leaders. As far as I can see
{{DistributedUpdateProcessor.versionAdd}} will do document-version-check if
versionsStored=true, leaderLogic=true and versionOnUpdate != 0. It will do
bucket-version-check if versionsStored=true and leaderLogic=false. This has
nothing to do with overwrite param. This version-check is not only for
add-commands but also for delete-commands.

The overwrite param controls only (in {{DirectUpdateHandler2}}) if you make
sure to delete an existing document with the same id, before you add the new
document. You do that by default, but if overwrite=false you just add the new
document, allowing duplicates (defined to be documents that have the same
id-value).

So as far as I read the code, document-version-check will only be performed on
leaders. Non-leaders will only do bucket-version-check, and I do not think that
is expensive?
As I said our version of Solr does not do document-version-check if
overwrite=false. I think you should introduce that as well. But besides that,
whats left to do in this area?

What did I not understand?

Review SolrCloud Indexing Performance.
--

Key: SOLR-6816
URL: https://issues.apache.org/jira/browse/SOLR-6816
Project: Solr
Issue Type: Task
Components: SolrCloud
Reporter: Mark Miller
Priority: Critical
Attachments: SolrBench.pdf

We have never really focused on indexing performance, just correctness and
low hanging fruit. We need to vet the performance and try to address any
holes.
Note: A common report is that adding any replication is very slow.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


 [ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-5951:
--

Assignee: Michael McCandless

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless

 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


 [ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5951:
---
Attachment: LUCENE-5951.patch

Patch w/ tests.

After I told Rob it's impossible to detect if a Path is backed by an
SSD with pure Java, he of course went and did it ;)

I added his isSSD method to IOUtils: it's a rough, Linux-only (for
now) method to determine if a Path is backed by an SSD (thank you
Rob!).

Then I fixed CMS to have dynamic defaults, so that the first time
merge is invoked, it checks the writer's directory.  If it's on an SSD,
it uses the pre LUCENE-4661 defaults (good for SSDs), else it uses the
current defaults (good for spinning disks).  It also logs this to infoStream
so we can use that to see what it did.


 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog

[
https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14246618#comment-14246618
]

Forest Soup edited comment on SOLR-6359 at 12/17/14 10:01 AM:
--

The numRecordsToKeep and maxNumLogsToKeep values should be in the
updateLog.like below.
!-- Enables a transaction log, used for real-time get, durability, and
and solr cloud replica recovery. The log can grow as big as
uncommitted changes to the index, so use of a hard autoCommit
is recommended (see below).
dir - the target directory for transaction logs, defaults to the
solr data directory. --
updateLog
str name=dir${solr.ulog.dir:}/str
int name=numRecordsToKeep1/int
int name=maxNumLogsToKeep100/int
/updateLog

was (Author: forest_soup):
The numRecordsToKeep and maxNumLogsToKeep values should be in the
updateLog., like below. Right?
!-- Enables a transaction log, used for real-time get, durability, and
and solr cloud replica recovery. The log can grow as big as
uncommitted changes to the index, so use of a hard autoCommit
is recommended (see below).
dir - the target directory for transaction logs, defaults to the
solr data directory. --
updateLog
str name=dir${solr.ulog.dir:}/str
int name=numRecordsToKeep1/int
int name=maxNumLogsToKeep100/int
/updateLog

Allow customization of the number of records and logs kept by UpdateLog
---

Key: SOLR-6359
URL: https://issues.apache.org/jira/browse/SOLR-6359
Project: Solr
Issue Type: Improvement
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor
Fix For: 5.0, Trunk

Currently {{UpdateLog}} hardcodes the number of logs and records it keeps,
and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the
records) in an heavily indexing setup, leading to full recovery even if Solr
was just stopped and restarted.
These values should be customizable (even if only present as expert options).

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box


[ 
https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249671#comment-14249671
 ] 

Michael McCandless commented on LUCENE-6117:


+1, thanks Rob!

 infostream is currently unusable out of box
 ---

 Key: LUCENE-6117
 URL: https://issues.apache.org/jira/browse/LUCENE-6117
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-6117.patch


 testpoints used to only be emitted by assertions (still sketchy), but now are 
 emitted always. I assume this is due to the change to support running tests 
 with assertions disabled.
 we should try to clean this up, simple stuff like this is now useless:
 {code}
 indexWriterConfig.setInfoStream(System.out);
 // causes massive flooding like this:
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 {code}
 I hit this several times today just trying to do benchmarks and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6816) Review SolrCloud Indexing Performance.

2014-12-17 Thread Per Steffensen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249673#comment-14249673
]

Per Steffensen commented on SOLR-6816:
--

Those of you that have been following my comments on misc issues will know that
I like separation of concerns. So in our version of Solr all this
decision-making on when to do document-version-check, when to delete existing
documents with same id-value etc is isolated in {{enum UpdateSemanticsMode}} -
see
https://issues.apache.org/jira/secure/attachment/12553312/SOLR-3173_3178_3382_3428_plus.patch.
We support different modes that makes slightly different decisions on the
above topics, which is the reason for using an enum. You do not need that,
because you only have one mode, but that should not prevent you from
separating the decision-making concern.

The patch is not entirely up to date with what we do today, but at least it
illustrates the separation of concerns. {{DistributedUpdateHandler}} deals
with a million concerns, so maybe you want to adopt that idea and move the code
making the decisions out of {{DistributedUpdateHandler}}.

Review SolrCloud Indexing Performance.
--

Key: SOLR-6816
URL: https://issues.apache.org/jira/browse/SOLR-6816
Project: Solr
Issue Type: Task
Components: SolrCloud
Reporter: Mark Miller
Priority: Critical
Attachments: SolrBench.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6816) Review SolrCloud Indexing Performance.

2014-12-17 Thread Per Steffensen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249673#comment-14249673
]

Per Steffensen edited comment on SOLR-6816 at 12/17/14 10:17 AM:
-

Only mention this because I sense that at least [~shalinmangar] agrees that
some cleanup (a.o. of {{DistributedUpdateHandler}}) is required:
https://twitter.com/shalinmangar/status/543874893549277184

was (Author: steff1193):
Those of you that have been following my comments on misc issues will know that
I like separation of concerns. So in our version of Solr all this
decision-making on when to do document-version-check, when to delete existing
documents with same id-value etc is isolated in {{enum UpdateSemanticsMode}} -
see
https://issues.apache.org/jira/secure/attachment/12553312/SOLR-3173_3178_3382_3428_plus.patch.
We support different modes that makes slightly different decisions on the
above topics, which is the reason for using an enum. You do not need that,
because you only have one mode, but that should not prevent you from
separating the decision-making concern.

Review SolrCloud Indexing Performance.
--

Key: SOLR-6816
URL: https://issues.apache.org/jira/browse/SOLR-6816
Project: Solr
Issue Type: Task
Components: SolrCloud
Reporter: Mark Miller
Priority: Critical
Attachments: SolrBench.pdf

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6118) Improve efficiency of the history structure for filter caching

2014-12-17 Thread Adrien Grand (JIRA)

Adrien Grand created LUCENE-6118:


 Summary: Improve efficiency of the history structure for filter 
caching
 Key: LUCENE-6118
 URL: https://issues.apache.org/jira/browse/LUCENE-6118
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor


The filter caching uses  a ring buffer that tracks frequencies of the hashcodes 
of the most-recently used filters. However it is based on an 
ArrayDequeInteger and a HashMapInteger which keep on (un)wrapping ints. 
Since the data-structure is very simple, we could try to do something better...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6640) ChaosMonkeySafeLeaderTest failure with CorruptIndexException


[ 
https://issues.apache.org/jira/browse/SOLR-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249708#comment-14249708
 ] 

Shalin Shekhar Mangar commented on SOLR-6640:
-

I am looking at this failure too and I see another bug. I was wondering why did 
the replica have these writes in the first place considering that it hadn't 
recovery on startup wasn't complete yet.

# RecoveryStrategy publishes the state of the replica as 'recovering' before it 
sets the update log to buffering mode which is why the leader sends updates to 
this replica that affect the index.
# The test itself doesn't wait for a steady state e.g. by calling 
waitForRecovery or waitForThingsToLevelOut before starting the indexing 
threads. This is probably a good thing because that's what has helped us find 
this problem.
# Shouldn't the peersync also be done while update log is set to buffering mode?

{quote}
So it's these files which are not getting removed when we do IW.rollback that 
were causing the problem - 
_0.cfe _0.cfs _0.si _0_1.liv _1.fdt _1.fdx
I am yet to figure out whether these files should have been removed by 
IW.rollback() or not?
{quote}

These files hang around because an IndexReader is open using the IndexWriter 
due to soft commit(s).

 ChaosMonkeySafeLeaderTest failure with CorruptIndexException
 

 Key: SOLR-6640
 URL: https://issues.apache.org/jira/browse/SOLR-6640
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 5.0
Reporter: Shalin Shekhar Mangar
 Fix For: 5.0

 Attachments: Lucene-Solr-5.x-Linux-64bit-jdk1.8.0_20-Build-11333.txt, 
 SOLR-6640.patch, SOLR-6640.patch


 Test failure found on jenkins:
 http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11333/
 {code}
 1 tests failed.
 REGRESSION:  org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch
 Error Message:
 shard2 is not consistent.  Got 62 from 
 http://127.0.0.1:57436/collection1lastClient and got 24 from 
 http://127.0.0.1:53065/collection1
 Stack Trace:
 java.lang.AssertionError: shard2 is not consistent.  Got 62 from 
 http://127.0.0.1:57436/collection1lastClient and got 24 from 
 http://127.0.0.1:53065/collection1
 at 
 __randomizedtesting.SeedInfo.seed([F4B371D421E391CD:7555FFCC56BCF1F1]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1255)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1234)
 at 
 org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:162)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 {code}
 Cause of inconsistency is:
 {code}
 Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, 
 expected segment id=yhq3vokoe1den2av9jbd3yp8, got=yhq3vokoe1den2av9jbd3yp7 
 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=/mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/test/J0/temp/solr.cloud.ChaosMonkeySafeLeaderTest-F4B371D421E391CD-001/tempDir-001/jetty3/index/_1_2.liv)))
[junit4]   2  at 
 org.apache.lucene.codecs.CodecUtil.checkSegmentHeader(CodecUtil.java:259)
[junit4]   2  at 
 org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:88)
[junit4]   2  at 
 org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64)
[junit4]   2  at 
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:102)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6118) Improve efficiency of the history structure for filter caching

2014-12-17 Thread Adrien Grand (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand updated LUCENE-6118:
-
Attachment: LUCENE-6118.patch

Here is a patch. No more java.lang.Integers and 22 bytes per entry on average 
(4 for the ring buffer and 18 for the bag that tracks frequencies).


 Improve efficiency of the history structure for filter caching
 --

 Key: LUCENE-6118
 URL: https://issues.apache.org/jira/browse/LUCENE-6118
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Adrien Grand
Assignee: Adrien Grand
Priority: Minor
 Attachments: LUCENE-6118.patch


 The filter caching uses  a ring buffer that tracks frequencies of the 
 hashcodes of the most-recently used filters. However it is based on an 
 ArrayDequeInteger and a HashMapInteger which keep on (un)wrapping ints. 
 Since the data-structure is very simple, we could try to do something 
 better...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-6640) ChaosMonkeySafeLeaderTest failure with CorruptIndexException


[ 
https://issues.apache.org/jira/browse/SOLR-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249708#comment-14249708
 ] 

Shalin Shekhar Mangar edited comment on SOLR-6640 at 12/17/14 10:59 AM:


I am looking at this failure too and I see another bug. I was wondering why the 
replica had these writes in the first place considering that recovery on 
startup had not completed.

# RecoveryStrategy publishes the state of the replica as 'recovering' before it 
sets the update log to buffering mode which is why the leader sends updates to 
this replica that affect the index.
# The test itself doesn't wait for a steady state e.g. by calling 
waitForRecovery or waitForThingsToLevelOut before starting the indexing 
threads. This is probably a good thing because that's what has helped us find 
this problem.
# Shouldn't the peersync also be done while update log is set to buffering mode?

{quote}
So it's these files which are not getting removed when we do IW.rollback that 
were causing the problem - 
_0.cfe _0.cfs _0.si _0_1.liv _1.fdt _1.fdx
I am yet to figure out whether these files should have been removed by 
IW.rollback() or not?
{quote}

These files hang around because an IndexReader is open using the IndexWriter 
due to soft commit(s).


was (Author: shalinmangar):
I am looking at this failure too and I see another bug. I was wondering why did 
the replica have these writes in the first place considering that it hadn't 
recovery on startup wasn't complete yet.

# RecoveryStrategy publishes the state of the replica as 'recovering' before it 
sets the update log to buffering mode which is why the leader sends updates to 
this replica that affect the index.
# The test itself doesn't wait for a steady state e.g. by calling 
waitForRecovery or waitForThingsToLevelOut before starting the indexing 
threads. This is probably a good thing because that's what has helped us find 
this problem.
# Shouldn't the peersync also be done while update log is set to buffering mode?

{quote}
So it's these files which are not getting removed when we do IW.rollback that 
were causing the problem - 
_0.cfe _0.cfs _0.si _0_1.liv _1.fdt _1.fdx
I am yet to figure out whether these files should have been removed by 
IW.rollback() or not?
{quote}

These files hang around because an IndexReader is open using the IndexWriter 
due to soft commit(s).

 ChaosMonkeySafeLeaderTest failure with CorruptIndexException
 

 Key: SOLR-6640
 URL: https://issues.apache.org/jira/browse/SOLR-6640
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 5.0
Reporter: Shalin Shekhar Mangar
 Fix For: 5.0

 Attachments: Lucene-Solr-5.x-Linux-64bit-jdk1.8.0_20-Build-11333.txt, 
 SOLR-6640.patch, SOLR-6640.patch


 Test failure found on jenkins:
 http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11333/
 {code}
 1 tests failed.
 REGRESSION:  org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch
 Error Message:
 shard2 is not consistent.  Got 62 from 
 http://127.0.0.1:57436/collection1lastClient and got 24 from 
 http://127.0.0.1:53065/collection1
 Stack Trace:
 java.lang.AssertionError: shard2 is not consistent.  Got 62 from 
 http://127.0.0.1:57436/collection1lastClient and got 24 from 
 http://127.0.0.1:53065/collection1
 at 
 __randomizedtesting.SeedInfo.seed([F4B371D421E391CD:7555FFCC56BCF1F1]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1255)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1234)
 at 
 org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:162)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 {code}
 Cause of inconsistency is:
 {code}
 Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, 
 expected segment id=yhq3vokoe1den2av9jbd3yp8, got=yhq3vokoe1den2av9jbd3yp7 
 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=/mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/test/J0/temp/solr.cloud.ChaosMonkeySafeLeaderTest-F4B371D421E391CD-001/tempDir-001/jetty3/index/_1_2.liv)))
[junit4]   2  at 
 org.apache.lucene.codecs.CodecUtil.checkSegmentHeader(CodecUtil.java:259)
[junit4]   2  at 
 org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:88)
[junit4]   2  at 
 org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64)
[junit4]

[jira] [Created] (LUCENE-6119) Add IndexWriter.getTotalNewBytesWritten

Michael McCandless created LUCENE-6119:
--

 Summary: Add IndexWriter.getTotalNewBytesWritten
 Key: LUCENE-6119
 URL: https://issues.apache.org/jira/browse/LUCENE-6119
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk


This method returns number of incoming bytes IW has written since it
was opened, excluding merging.

It tracks flushed segments, new commits (segments_N), incoming
files/segments by addIndexes, newly written live docs / doc values
updates files.

It's an easy statistic for IW to track and should be useful to help
applications more intelligently set defaults for IO throttling
(RateLimiter).

For example, an application that does hardly any indexing but finally
triggered a large merge can afford to heavily throttle that large
merge so it won't interfere with ongoing searches.

But an application that's causing IW to write new bytes at 50 MB/sec
must set a correspondingly higher IO throttling otherwise merges will
clearly fall behind.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6119) Add IndexWriter.getTotalNewBytesWritten


 [ 
https://issues.apache.org/jira/browse/LUCENE-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-6119:
---
Attachment: LUCENE-6119.patch

Simple patch + test.

 Add IndexWriter.getTotalNewBytesWritten
 ---

 Key: LUCENE-6119
 URL: https://issues.apache.org/jira/browse/LUCENE-6119
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6119.patch


 This method returns number of incoming bytes IW has written since it
 was opened, excluding merging.
 It tracks flushed segments, new commits (segments_N), incoming
 files/segments by addIndexes, newly written live docs / doc values
 updates files.
 It's an easy statistic for IW to track and should be useful to help
 applications more intelligently set defaults for IO throttling
 (RateLimiter).
 For example, an application that does hardly any indexing but finally
 triggered a large merge can afford to heavily throttle that large
 merge so it won't interfere with ongoing searches.
 But an application that's causing IW to write new bytes at 50 MB/sec
 must set a correspondingly higher IO throttling otherwise merges will
 clearly fall behind.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans

2014-12-17 Thread Alan Woodward (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249722#comment-14249722
 ] 

Alan Woodward commented on LUCENE-2878:
---

Hi Rob,

Thanks for helping out here!

The additional branches are there to allow for returning NO_MORE_POSITIONS once 
the positions are exhausted, but maybe I should move that logic up to 
TermScorer instead and put the assertions back in for the PostingsReaders.

Merging TermsEnum.docs() and TermsEnum.docsAndPositions() might be tricky 
because their API is different - docs() never returns null, while 
docsAndPositions() will return null if the relevant postings data isn't 
indexed.  Although having said that, I'm probably already breaking that 
contract by redirecting from one to the other with the flags check.

I'll fix TermScorer.

I haven't nuked Spans yet, mainly because I think we should probably keep them 
(as deprecated) in 5.0, and remove them only in trunk.  It would also make the 
patch bigger :-)

I changed existing test files rather than adding any new ones, apart from the 
tests exercising the PositionFilterQueries.

Maybe a way to reduce the size of the patch would be to remove the 
PositionFilterQueries from this issue and create a new one for them?  Then this 
one is just about changing the DocsEnum/TermsEnum API.

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't

[jira] [Commented] (SOLR-6640) ChaosMonkeySafeLeaderTest failure with CorruptIndexException


[ 
https://issues.apache.org/jira/browse/SOLR-6640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249771#comment-14249771
 ] 

Shalin Shekhar Mangar commented on SOLR-6640:
-

So what is the right way to implement partial replication in this case? Force 
deleting the file (Varun's patch) probably won't work on windows and/or not 
play well with the open searchers. In SolrCloud we could just close the 
searcher before rollback because a replica in recovery won't get any search 
requests but that's not practical in standalone Solr because it'd cause 
downtime.

 ChaosMonkeySafeLeaderTest failure with CorruptIndexException
 

 Key: SOLR-6640
 URL: https://issues.apache.org/jira/browse/SOLR-6640
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 5.0
Reporter: Shalin Shekhar Mangar
 Fix For: 5.0

 Attachments: Lucene-Solr-5.x-Linux-64bit-jdk1.8.0_20-Build-11333.txt, 
 SOLR-6640.patch, SOLR-6640.patch


 Test failure found on jenkins:
 http://jenkins.thetaphi.de/job/Lucene-Solr-5.x-Linux/11333/
 {code}
 1 tests failed.
 REGRESSION:  org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.testDistribSearch
 Error Message:
 shard2 is not consistent.  Got 62 from 
 http://127.0.0.1:57436/collection1lastClient and got 24 from 
 http://127.0.0.1:53065/collection1
 Stack Trace:
 java.lang.AssertionError: shard2 is not consistent.  Got 62 from 
 http://127.0.0.1:57436/collection1lastClient and got 24 from 
 http://127.0.0.1:53065/collection1
 at 
 __randomizedtesting.SeedInfo.seed([F4B371D421E391CD:7555FFCC56BCF1F1]:0)
 at org.junit.Assert.fail(Assert.java:93)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1255)
 at 
 org.apache.solr.cloud.AbstractFullDistribZkTestBase.checkShardConsistency(AbstractFullDistribZkTestBase.java:1234)
 at 
 org.apache.solr.cloud.ChaosMonkeySafeLeaderTest.doTest(ChaosMonkeySafeLeaderTest.java:162)
 at 
 org.apache.solr.BaseDistributedSearchTestCase.testDistribSearch(BaseDistributedSearchTestCase.java:869)
 {code}
 Cause of inconsistency is:
 {code}
 Caused by: org.apache.lucene.index.CorruptIndexException: file mismatch, 
 expected segment id=yhq3vokoe1den2av9jbd3yp8, got=yhq3vokoe1den2av9jbd3yp7 
 (resource=BufferedChecksumIndexInput(MMapIndexInput(path=/mnt/ssd/jenkins/workspace/Lucene-Solr-5.x-Linux/solr/build/solr-core/test/J0/temp/solr.cloud.ChaosMonkeySafeLeaderTest-F4B371D421E391CD-001/tempDir-001/jetty3/index/_1_2.liv)))
[junit4]   2  at 
 org.apache.lucene.codecs.CodecUtil.checkSegmentHeader(CodecUtil.java:259)
[junit4]   2  at 
 org.apache.lucene.codecs.lucene50.Lucene50LiveDocsFormat.readLiveDocs(Lucene50LiveDocsFormat.java:88)
[junit4]   2  at 
 org.apache.lucene.codecs.asserting.AssertingLiveDocsFormat.readLiveDocs(AssertingLiveDocsFormat.java:64)
[junit4]   2  at 
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:102)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

2014-12-17 Thread Adrien Grand (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249785#comment-14249785
 ] 

Adrien Grand commented on LUCENE-5951:
--

+1

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6801) Load components from blob store


 [ 
https://issues.apache.org/jira/browse/SOLR-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6801:
-
Description: 
The solrconfig APIs ( SOLR-6607)  now allow registering components through API. 
SOLR-6787 will support for blob storage. 

Jars should be able to be loaded from blobs

example 
{code}
 curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: 
application/json  -d '{
create-requesthandler : {name : /mypath ,
   class:org.apache.solr.handler.DumpRequestHandler,
   lib : mycomponent,
   version:2}
}'

{code}

  was:
The solrconfig APIs ( SOLR-6607)  now allow registering components through API. 
SOLR-6787 will support for blob storage. 

Jars should be able to be loaded from blobs

example 
{code}
 curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: 
application/json  -d '{
create-requesthandler : {name : /mypath ,
   class:org.apache.solr.handler.DumpRequestHandler,
   startup:lazy,
   lib : .system:mycomponent,
   version:2}
}'

{code}


 Load components from blob store
 ---

 Key: SOLR-6801
 URL: https://issues.apache.org/jira/browse/SOLR-6801
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul

 The solrconfig APIs ( SOLR-6607)  now allow registering components through 
 API. SOLR-6787 will support for blob storage. 
 Jars should be able to be loaded from blobs
 example 
 {code}
  curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: 
 application/json  -d '{
 create-requesthandler : {name : /mypath ,

 class:org.apache.solr.handler.DumpRequestHandler,
lib : mycomponent,
version:2}
 }'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6801) Load components from blob store


 [ 
https://issues.apache.org/jira/browse/SOLR-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6801:
-
Attachment: SOLR-6801.patch

Feature complete. No testcases yet. 

I will add the testcases and do some refactoring and commit this soon.

comments/suggestions are welcome

 Load components from blob store
 ---

 Key: SOLR-6801
 URL: https://issues.apache.org/jira/browse/SOLR-6801
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6801.patch


 The solrconfig APIs ( SOLR-6607)  now allow registering components through 
 API. SOLR-6787 will support for blob storage. 
 Jars should be able to be loaded from blobs
 example 
 {code}
  curl http://localhost:8983/solr/gettingstarted/config -H Content-Type: 
 application/json  -d '{
 create-requesthandler : {name : /mypath ,

 class:org.apache.solr.handler.DumpRequestHandler,
lib : mycomponent,
version:2}
 }'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249820#comment-14249820
 ] 

Shalin Shekhar Mangar commented on LUCENE-5951:
---

+1

Very nice!

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6683) Need a configurable parameter to control the doc number between peersync and the snapshot pull recovery


[ 
https://issues.apache.org/jira/browse/SOLR-6683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249826#comment-14249826
 ] 

Forest Soup commented on SOLR-6683:
---

I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as 
expected. 
When I set below config, it still go into SnapPuller code even if I only newly 
added 800 doc.
updateLog
  str name=dir${solr.ulog.dir:}/str
  int name=numRecordsToKeep1/int
  int name=maxNumLogsToKeep100/int
/updateLog

After my reading code, it seems that lines in 
org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the 
issue:
if (ourHighThreshold  otherLow) {
  // Small overlap between version windows and ours is older
  // This means that we might miss updates if we attempted to use this 
method.
  // Since there exists just one replica that is so much newer, we must
  // fail the sync.
  log.info(msg() +  Our versions are too old. 
ourHighThreshold=+ourHighThreshold +  otherLowThreshold=+otherLow);
  return false;
} 

Could you please comment? Thanks!

 Need a configurable parameter to control the doc number between peersync and 
 the snapshot pull recovery
 ---

 Key: SOLR-6683
 URL: https://issues.apache.org/jira/browse/SOLR-6683
 Project: Solr
  Issue Type: Bug
  Components: replication (java)
Affects Versions: 4.7
 Environment: Redhat Linux 64bit
Reporter: Forest Soup
Priority: Critical
  Labels: performance

 If there are 100 docs gap between the recovering node and the good node, the 
 solr will do snap pull recovery instead of peersync.
 Can the 100 docs be configurable? For example, there can be 1, 1000, or 
 10 docs gap between the good node and the node to recover.
 For 100 doc, a regular restart of a solr node will trigger a full recovery, 
 which is a huge impact to the performance of the running systems
 Thanks!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6359) Allow customization of the number of records and logs kept by UpdateLog


[ 
https://issues.apache.org/jira/browse/SOLR-6359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249827#comment-14249827
 ] 

Forest Soup commented on SOLR-6359:
---

I applied the patch for SOLR-6359 on 4.7 and did some test. It does not work as 
expected. 
When I set below config, it still go into SnapPuller code even if I only newly 
added 800 doc.
updateLog
str name=dir$
{solr.ulog.dir:}
/str
int name=numRecordsToKeep1/int
int name=maxNumLogsToKeep100/int
/updateLog
After my reading code, it seems that lines in 
org.apache.solr.update.PeerSync.handleVersions(ShardResponse srsp) cause the 
issue:
if (ourHighThreshold  otherLow)
{ // Small overlap between version windows and ours is older // This means that 
we might miss updates if we attempted to use this method. // Since there exists 
just one replica that is so much newer, we must // fail the sync. 
log.info(msg() +  Our versions are too old. 
ourHighThreshold=+ourHighThreshold +  otherLowThreshold=+otherLow); return 
false; }
Could you please comment? Thanks!

 Allow customization of the number of records and logs kept by UpdateLog
 ---

 Key: SOLR-6359
 URL: https://issues.apache.org/jira/browse/SOLR-6359
 Project: Solr
  Issue Type: Improvement
Reporter: Ramkumar Aiyengar
Assignee: Mark Miller
Priority: Minor
 Fix For: 5.0, Trunk


 Currently {{UpdateLog}} hardcodes the number of logs and records it keeps, 
 and the hardcoded numbers (100 records, 10 logs) can be quite low (esp. the 
 records) in an heavily indexing setup, leading to full recovery even if Solr 
 was just stopped and restarted.
 These values should be customizable (even if only present as expert options).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6119) Add IndexWriter.getTotalNewBytesWritten

2014-12-17 Thread ASF subversion and git services (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-6119?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249837#comment-14249837
]

Michael McCandless commented on LUCENE-6119:

Thinking about this more ... it may be better to do this entirely inside a
FilterDirectory.

E.g. when IndexOutput is closed, and the IOContext is not MERGE, increment the
bytes written ... and then that same directory instance could dynamically
update the target merge throttling ... maybe.

Add IndexWriter.getTotalNewBytesWritten
---

Key: LUCENE-6119
URL: https://issues.apache.org/jira/browse/LUCENE-6119
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Fix For: 5.0, Trunk

Attachments: LUCENE-6119.patch

This method returns number of incoming bytes IW has written since it
was opened, excluding merging.
It tracks flushed segments, new commits (segments_N), incoming
files/segments by addIndexes, newly written live docs / doc values
updates files.
It's an easy statistic for IW to track and should be useful to help
applications more intelligently set defaults for IO throttling
(RateLimiter).
For example, an application that does hardly any indexing but finally
triggered a large merge can afford to heavily throttle that large
merge so it won't interfere with ongoing searches.
But an application that's causing IW to write new bytes at 50 MB/sec
must set a correspondingly higher IO throttling otherwise merges will
clearly fall behind.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6850) AutoAddReplicas does not wait enough for a replica to get live

2014-12-17 Thread Varun Thacker (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249843#comment-14249843
 ] 

Varun Thacker commented on SOLR-6850:
-

[~markrmil...@gmail.com] What are your thoughts on this?

 AutoAddReplicas does not wait enough for a replica to get live
 --

 Key: SOLR-6850
 URL: https://issues.apache.org/jira/browse/SOLR-6850
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10, 4.10.1, 4.10.2, 5.0, Trunk
Reporter: Varun Thacker
 Attachments: SOLR-6850.patch, SOLR-6850.patch


 After we have detected that a replica needs failing over, we add a replica 
 and wait to see if it's live.
 Currently we only wait for 30ms , but I think the intention here was to wait 
 for 30s.
 In CloudStateUtil.waitToSeeLive() the conversion should have been 
 {{System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutInMs, 
 TimeUnit.SECONDS);}} instead of {{System.nanoTime() + 
 TimeUnit.NANOSECONDS.convert(timeoutInMs, TimeUnit.MILLISECONDS);}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box


[ 
https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249855#comment-14249855
 ] 

ASF subversion and git services commented on LUCENE-6117:
-

Commit 1646240 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1646240 ]

LUCENE-6117: make infostream usable again

 infostream is currently unusable out of box
 ---

 Key: LUCENE-6117
 URL: https://issues.apache.org/jira/browse/LUCENE-6117
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-6117.patch


 testpoints used to only be emitted by assertions (still sketchy), but now are 
 emitted always. I assume this is due to the change to support running tests 
 with assertions disabled.
 we should try to clean this up, simple stuff like this is now useless:
 {code}
 indexWriterConfig.setInfoStream(System.out);
 // causes massive flooding like this:
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 {code}
 I hit this several times today just trying to do benchmarks and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box

2014-12-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249858#comment-14249858
 ] 

ASF subversion and git services commented on LUCENE-6117:
-

Commit 1646242 from [~rcmuir] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1646242 ]

LUCENE-6117: make infostream usable again

 infostream is currently unusable out of box
 ---

 Key: LUCENE-6117
 URL: https://issues.apache.org/jira/browse/LUCENE-6117
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Attachments: LUCENE-6117.patch


 testpoints used to only be emitted by assertions (still sketchy), but now are 
 emitted always. I assume this is due to the change to support running tests 
 with assertions disabled.
 we should try to clean this up, simple stuff like this is now useless:
 {code}
 indexWriterConfig.setInfoStream(System.out);
 // causes massive flooding like this:
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 {code}
 I hit this several times today just trying to do benchmarks and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-6117) infostream is currently unusable out of box

2014-12-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-6117.
-
   Resolution: Fixed
Fix Version/s: Trunk
   5.0

 infostream is currently unusable out of box
 ---

 Key: LUCENE-6117
 URL: https://issues.apache.org/jira/browse/LUCENE-6117
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6117.patch


 testpoints used to only be emitted by assertions (still sketchy), but now are 
 emitted always. I assume this is due to the change to support running tests 
 with assertions disabled.
 we should try to clean this up, simple stuff like this is now useless:
 {code}
 indexWriterConfig.setInfoStream(System.out);
 // causes massive flooding like this:
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 {code}
 I hit this several times today just trying to do benchmarks and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6114) Remove bw compat cruft from packedints


[ 
https://issues.apache.org/jira/browse/LUCENE-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249866#comment-14249866
 ] 

ASF subversion and git services commented on LUCENE-6114:
-

Commit 1646247 from [~rcmuir] in branch 'dev/trunk'
[ https://svn.apache.org/r1646247 ]

LUCENE-6114: remove bw compat cruft from packedints

 Remove bw compat cruft from packedints
 --

 Key: LUCENE-6114
 URL: https://issues.apache.org/jira/browse/LUCENE-6114
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Fix For: Trunk

 Attachments: LUCENE-6114.patch


 In trunk we have some old logic that is not needed (versions 0 and 1). So we 
 can remove support for structures that aren't byte-aligned, zigzag-encoded 
 monotonics, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-6114) Remove bw compat cruft from packedints


 [ 
https://issues.apache.org/jira/browse/LUCENE-6114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-6114.
-
Resolution: Fixed

 Remove bw compat cruft from packedints
 --

 Key: LUCENE-6114
 URL: https://issues.apache.org/jira/browse/LUCENE-6114
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir
 Fix For: Trunk

 Attachments: LUCENE-6114.patch


 In trunk we have some old logic that is not needed (versions 0 and 1). So we 
 can remove support for structures that aren't byte-aligned, zigzag-encoded 
 monotonics, etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-6854) Stale cached state in CloudSolrServer


 [ 
https://issues.apache.org/jira/browse/SOLR-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul reassigned SOLR-6854:


Assignee: Noble Paul

 Stale cached state in CloudSolrServer
 -

 Key: SOLR-6854
 URL: https://issues.apache.org/jira/browse/SOLR-6854
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud, SolrJ
Reporter: Jessica Cheng Mallet
Assignee: Noble Paul
  Labels: cache, solrcloud, solrj

 CloudSolrServer’s cached state is not being updated for a newly created 
 collection if we started polling for the collection state too early and a 
 down state is cached. Requests to the newly created collection continues to 
 fail with No live SolrServers available to handle this request until the 
 cache is invalidated by time.
 Logging on the client side reveals that while the state in ZkStateReader is 
 updated to active, the cached state in CloudSolrServer remains in down.
 {quote}
 CloudSolrServer cached state:
 DocCollection(collection-1418250319268)={
   shards:{shard1:{
   range:8000-7fff,
   state:active,
   replicas:{core_node1:{
   state:down,
   base_url:http://localhost:8983/solr;,
   core:collection-1418250319268_shard1_replica1,
   node_name:localhost:8983_solr,
   maxShardsPerNode:1,
   external:true,
   router:{ name:compositeId},
   replicationFactor:1”}
 ZkStateReader state:
 DocCollection(collection-1418250319268)={
   shards:{shard1:{
   range:8000-7fff,
   state:active,
   replicas:{core_node1:{
   state:active,
   base_url:http://localhost:8983/solr;,
   core:collection-1418250319268_shard1_replica1,
   node_name:localhost:8983_solr,
   leader:true,
   maxShardsPerNode:1,
   router:{ name:compositeId},
   external:true,
   replicationFactor:1”}
 {quote}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6127) Improve Solr's exampledocs data

2014-12-17 Thread Varun Thacker (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249871#comment-14249871
]

Varun Thacker commented on SOLR-6127:
-

I think we could do the following -

1. Take the film.json|xml|csv files and replace it with all the data in the
exampledocs folder
2. Put the python script in the dev-tools folder so that in the future if we
want to update the data we can use it.
3. Drop in the LICENSE.txt file in the exampledocs folder?

On the website I can see this place which would need to be updated -
Indexing Solr XML , Indexing JSON, Indexing CSV (Comma/Column Separated
Values) - http://lucene.apache.org/solr/quickstart.html

Maybe also updated the Searching section on the quickstart page also? We
could use the material attached on the README.txt uploaded here.

Oh, we will have to update the schema in sample_techproducts_configs
configset and the browse handler in solrconfig with the new data too

Improve Solr's exampledocs data
---

Key: SOLR-6127
URL: https://issues.apache.org/jira/browse/SOLR-6127
Project: Solr
Issue Type: Improvement
Components: documentation, scripts and tools
Reporter: Varun Thacker
Assignee: Erik Hatcher
Fix For: 5.0, Trunk

Attachments: LICENSE.txt, README.txt, README.txt, film.csv,
film.json, film.xml, freebase_film_dump.py, freebase_film_dump.py,
freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py,
freebase_film_dump.py, freebase_film_dump.py

Currently
- The CSV example has 10 documents.
- The JSON example has 4 documents.
- The XML example has 32 documents.
1. We should have equal number of documents and the same documents in all the
example formats
2. A data set which is slightly more comprehensive.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6554) Speed up overseer operations for collections with stateFormat 1

2014-12-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6554:

Attachment: SOLR-6554-workqueue-fixes.patch

# Fixes the logic to add/remove items from workQueue so that the invariant is 
maintained
# Adds a ZkWriteListener interface which is used by Overseer to add/remove 
items from the workQueue depending on how/when state is flushed to ZK
# The earlier patches enabled batching on work queue processing but that is 
wrong because we do not have any fallback if a batch fails. So batching is 
disabled whenever we operate on items from the work queue.

 Speed up overseer operations for collections with stateFormat  1
 -

 Key: SOLR-6554
 URL: https://issues.apache.org/jira/browse/SOLR-6554
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 5.0, Trunk
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-6554-batching-refactor.patch, 
 SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, 
 SOLR-6554-batching-refactor.patch, SOLR-6554-workqueue-fixes.patch, 
 SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, 
 SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch


 Right now (after SOLR-5473 was committed), a node watches a collection only 
 if stateFormat=1 or if that node hosts at least one core belonging to that 
 collection.
 This means that a node which is the overseer operates on all collections but 
 watches only a few. So any read goes directly to zookeeper which slows down 
 overseer operations.
 Let's have the overseer node watch all collections always and never remove 
 those watches (except when the collection itself is deleted).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2878) Allow Scorer to expose positions and payloads aka. nuke spans


[ 
https://issues.apache.org/jira/browse/LUCENE-2878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249924#comment-14249924
 ] 

ASF subversion and git services commented on LUCENE-2878:
-

Commit 1646271 from [~romseygeek] in branch 'dev/branches/lucene2878'
[ https://svn.apache.org/r1646271 ]

LUCENE-2878: Remove dead code from TermScorer

 Allow Scorer to expose positions and payloads aka. nuke spans 
 --

 Key: LUCENE-2878
 URL: https://issues.apache.org/jira/browse/LUCENE-2878
 Project: Lucene - Core
  Issue Type: Improvement
  Components: core/search
Affects Versions: Positions Branch
Reporter: Simon Willnauer
Assignee: Robert Muir
  Labels: gsoc2014
 Fix For: Positions Branch

 Attachments: LUCENE-2878-OR.patch, LUCENE-2878-vs-trunk.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878.patch, 
 LUCENE-2878.patch, LUCENE-2878.patch, LUCENE-2878_trunk.patch, 
 LUCENE-2878_trunk.patch, PosHighlighter.patch, PosHighlighter.patch


 Currently we have two somewhat separate types of queries, the one which can 
 make use of positions (mainly spans) and payloads (spans). Yet Span*Query 
 doesn't really do scoring comparable to what other queries do and at the end 
 of the day they are duplicating lot of code all over lucene. Span*Queries are 
 also limited to other Span*Query instances such that you can not use a 
 TermQuery or a BooleanQuery with SpanNear or anthing like that. 
 Beside of the Span*Query limitation other queries lacking a quiet interesting 
 feature since they can not score based on term proximity since scores doesn't 
 expose any positional information. All those problems bugged me for a while 
 now so I stared working on that using the bulkpostings API. I would have done 
 that first cut on trunk but TermScorer is working on BlockReader that do not 
 expose positions while the one in this branch does. I started adding a new 
 Positions class which users can pull from a scorer, to prevent unnecessary 
 positions enums I added ScorerContext#needsPositions and eventually 
 Scorere#needsPayloads to create the corresponding enum on demand. Yet, 
 currently only TermQuery / TermScorer implements this API and other simply 
 return null instead. 
 To show that the API really works and our BulkPostings work fine too with 
 positions I cut over TermSpanQuery to use a TermScorer under the hood and 
 nuked TermSpans entirely. A nice sideeffect of this was that the Position 
 BulkReading implementation got some exercise which now :) work all with 
 positions while Payloads for bulkreading are kind of experimental in the 
 patch and those only work with Standard codec. 
 So all spans now work on top of TermScorer ( I truly hate spans since today ) 
 including the ones that need Payloads (StandardCodec ONLY)!!  I didn't bother 
 to implement the other codecs yet since I want to get feedback on the API and 
 on this first cut before I go one with it. I will upload the corresponding 
 patch in a minute. 
 I also had to cut over SpanQuery.getSpans(IR) to 
 SpanQuery.getSpans(AtomicReaderContext) which I should probably do on trunk 
 first but after that pain today I need a break first :).
 The patch passes all core tests 
 (org.apache.lucene.search.highlight.HighlighterTest still fails but I didn't 
 look into the MemoryIndex BulkPostings API yet)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6857) Idea modules missing dependencies

2014-12-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249927#comment-14249927
 ] 

ASF subversion and git services commented on SOLR-6857:
---

Commit 1646272 from [~sar...@syr.edu] in branch 'dev/trunk'
[ https://svn.apache.org/r1646272 ]

SOLR-6857: Idea modules missing dependencies

 Idea modules missing dependencies
 -

 Key: SOLR-6857
 URL: https://issues.apache.org/jira/browse/SOLR-6857
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: Trunk
 Environment: IntelliJ IDEA
Reporter: James Strassburg
Priority: Trivial
 Attachments: SOLR-6857.patch


 The IDEA dev-tools configuration doesn't build in IDEA after running ant idea 
 because the following modules are missing a dependency to analysis-common 
 module:
 * velocity
 * extraction
 * map-reduce
 * dataimporthandler-extras
 To reproduce, run ant clean-idea followed by ant idea. Open the project in 
 IDEA, configure the JDK, and make the project. The modules listed above will 
 fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. 
 Adding analysis-common as a module dependency fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6857) Idea modules missing dependencies

2014-12-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249938#comment-14249938
 ] 

ASF subversion and git services commented on SOLR-6857:
---

Commit 1646275 from [~sar...@syr.edu] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1646275 ]

SOLR-6857: Idea modules missing dependencies (merged trunk r1646272)

 Idea modules missing dependencies
 -

 Key: SOLR-6857
 URL: https://issues.apache.org/jira/browse/SOLR-6857
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: Trunk
 Environment: IntelliJ IDEA
Reporter: James Strassburg
Priority: Trivial
 Attachments: SOLR-6857.patch


 The IDEA dev-tools configuration doesn't build in IDEA after running ant idea 
 because the following modules are missing a dependency to analysis-common 
 module:
 * velocity
 * extraction
 * map-reduce
 * dataimporthandler-extras
 To reproduce, run ant clean-idea followed by ant idea. Open the project in 
 IDEA, configure the JDK, and make the project. The modules listed above will 
 fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. 
 Adding analysis-common as a module dependency fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6797) Add score=degrees|kilometers|miles for AbstractSpatialFieldType

2014-12-17 Thread Ishan Chattopadhyaya (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6797?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-6797:
---
Attachment: SOLR-6797.patch

That makes sense; more intuitive than a separate units/distanceUnits parameter. 
Attached a patch that supports score=distance (back compat, km when geo) | 
kilometers | miles | degrees | area/area2D (km^2 when geo, deg^2 in 2D). Tested 
manually and seems to work. 

 Add score=degrees|kilometers|miles for AbstractSpatialFieldType
 ---

 Key: SOLR-6797
 URL: https://issues.apache.org/jira/browse/SOLR-6797
 Project: Solr
  Issue Type: Improvement
  Components: spatial
Reporter: David Smiley
 Attachments: SOLR-6797.patch


 Annoyingly, the units=degrees attribute is required for fields extending 
 AbstractSpatialFieldType (e.g. RPT, BBox).  And it doesn't really have any 
 effect.  I propose the following:
 * Simply drop the attribute; ignore it if someone sets it to degrees (for 
 back-compat).
 * When using score=distance, or score=area or area2D (as seen in BBoxField) 
 then use kilometers if geo=true, otherwise degrees.
 * Add support for score=degrees|kilometers|miles|degrees



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6857) Idea modules missing dependencies

2014-12-17 Thread James Strassburg (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Strassburg resolved SOLR-6857.

   Resolution: Fixed
Fix Version/s: Trunk
   5.0

 Idea modules missing dependencies
 -

 Key: SOLR-6857
 URL: https://issues.apache.org/jira/browse/SOLR-6857
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: Trunk
 Environment: IntelliJ IDEA
Reporter: James Strassburg
Priority: Trivial
 Fix For: 5.0, Trunk

 Attachments: SOLR-6857.patch


 The IDEA dev-tools configuration doesn't build in IDEA after running ant idea 
 because the following modules are missing a dependency to analysis-common 
 module:
 * velocity
 * extraction
 * map-reduce
 * dataimporthandler-extras
 To reproduce, run ant clean-idea followed by ant idea. Open the project in 
 IDEA, configure the JDK, and make the project. The modules listed above will 
 fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. 
 Adding analysis-common as a module dependency fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-6120) how should MockIndexOutputWrapper.close handle exceptions in delegate.close

Michael McCandless created LUCENE-6120:
--

 Summary: how should MockIndexOutputWrapper.close handle exceptions 
in delegate.close
 Key: LUCENE-6120
 URL: https://issues.apache.org/jira/browse/LUCENE-6120
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Priority: Minor


Chasing a tricking Elasticsearch test failure, it came down to the 
delegate.close throwing an exception (ClosedByInterruptException, disturbingly, 
in this case), causing MockIndexOutputWrapper.close to fail to remove that IO 
from MDW's map.

The question is, what should we do here, when delegate.close throws an 
exception?  Is the delegate in fact closed, even when it throws an exception?  

Java8's docs on java.io.Closeable say this:

As noted in AutoCloseable.close(), cases where the close may fail require 
careful attention. It is strongly advised to relinquish the underlying 
resources and to internally mark the Closeable as closed, prior to throwing the 
IOException.

And our OutputStreamIndexOutput is careful about this (flushes, then closes in 
a try-with-resources).

So, I think MDW should be fixed to mark the IO as closed even if delegate.close 
throws an exception...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-6857) Idea modules missing dependencies

2014-12-17 Thread Steve Rowe (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Rowe reassigned SOLR-6857:


Assignee: Steve Rowe

 Idea modules missing dependencies
 -

 Key: SOLR-6857
 URL: https://issues.apache.org/jira/browse/SOLR-6857
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: Trunk
 Environment: IntelliJ IDEA
Reporter: James Strassburg
Assignee: Steve Rowe
Priority: Trivial
 Fix For: 5.0, Trunk

 Attachments: SOLR-6857.patch


 The IDEA dev-tools configuration doesn't build in IDEA after running ant idea 
 because the following modules are missing a dependency to analysis-common 
 module:
 * velocity
 * extraction
 * map-reduce
 * dataimporthandler-extras
 To reproduce, run ant clean-idea followed by ant idea. Open the project in 
 IDEA, configure the JDK, and make the project. The modules listed above will 
 fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. 
 Adding analysis-common as a module dependency fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6857) Idea modules missing dependencies

2014-12-17 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249960#comment-14249960
 ] 

Steve Rowe commented on SOLR-6857:
--

Thanks [~jstrassburg], I committed your patch to trunk and branch_5x.  On 
branch_5x under Java7, the patch isn't required, but it is for some reason 
under Java8, so I committed there too.

I was going to resolve the issue but you had already done so :) - the 
convention is that the person who commits the fix resolves the issue.



 Idea modules missing dependencies
 -

 Key: SOLR-6857
 URL: https://issues.apache.org/jira/browse/SOLR-6857
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: Trunk
 Environment: IntelliJ IDEA
Reporter: James Strassburg
Priority: Trivial
 Fix For: 5.0, Trunk

 Attachments: SOLR-6857.patch


 The IDEA dev-tools configuration doesn't build in IDEA after running ant idea 
 because the following modules are missing a dependency to analysis-common 
 module:
 * velocity
 * extraction
 * map-reduce
 * dataimporthandler-extras
 To reproduce, run ant clean-idea followed by ant idea. Open the project in 
 IDEA, configure the JDK, and make the project. The modules listed above will 
 fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. 
 Adding analysis-common as a module dependency fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

[
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-5951:

Attachment: LUCENE-5951.patch

Try to improve the SSD detector more to make it safe to use for this purpose.
It was mostly a joke and really ... not good code. :)

* fix contract to throw IOException when incoming path does not exist. This is
important not to mask.
* for our internal heuristics, we could easily trigger SecurityException /
AIOOBE, we are doing things that are not guaranteed at all. So those are
important to mask.
* don't use Files.readAllBytes, that method is too dangerous in these
heuristics. Just read one byte.

We should improve the getDeviceName too, but its less critical.

Detect when index is on SSD and set dynamic defaults

Key: LUCENE-5951
URL: https://issues.apache.org/jira/browse/LUCENE-5951
Project: Lucene - Core
Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
Attachments: LUCENE-5951.patch, LUCENE-5951.patch

E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on
SSD and 1 if it's on spinning disks.
I think the new NIO2 APIs can let us figure out which device we are mounted
on, and from there maybe we can do os-specific stuff e.g. look at
/sys/block/dev/queue/rotational to see if it's spinning storage or not ...

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6857) Idea modules missing dependencies

2014-12-17 Thread James Strassburg (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249966#comment-14249966
 ] 

James Strassburg commented on SOLR-6857:


OK, this was my first submission. I saw the commit and verified it before
closing but I won't close them in the future. thanks.



 Idea modules missing dependencies
 -

 Key: SOLR-6857
 URL: https://issues.apache.org/jira/browse/SOLR-6857
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: Trunk
 Environment: IntelliJ IDEA
Reporter: James Strassburg
Assignee: Steve Rowe
Priority: Trivial
 Fix For: 5.0, Trunk

 Attachments: SOLR-6857.patch


 The IDEA dev-tools configuration doesn't build in IDEA after running ant idea 
 because the following modules are missing a dependency to analysis-common 
 module:
 * velocity
 * extraction
 * map-reduce
 * dataimporthandler-extras
 To reproduce, run ant clean-idea followed by ant idea. Open the project in 
 IDEA, configure the JDK, and make the project. The modules listed above will 
 fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. 
 Adding analysis-common as a module dependency fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6858) Leader sync's PeerSync use cannot consider SocketException or NoHttpResponseException success.

Mark Miller created SOLR-6858:
-

 Summary: Leader sync's PeerSync use cannot consider 
SocketException or NoHttpResponseException success.
 Key: SOLR-6858
 URL: https://issues.apache.org/jira/browse/SOLR-6858
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller
 Fix For: 5.0






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6857) Idea modules missing dependencies

2014-12-17 Thread Steve Rowe (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14249973#comment-14249973
 ] 

Steve Rowe commented on SOLR-6857:
--

bq. this was my first submission

Cool, thanks again, (more of your) patches welcome!

 Idea modules missing dependencies
 -

 Key: SOLR-6857
 URL: https://issues.apache.org/jira/browse/SOLR-6857
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: Trunk
 Environment: IntelliJ IDEA
Reporter: James Strassburg
Assignee: Steve Rowe
Priority: Trivial
 Fix For: 5.0, Trunk

 Attachments: SOLR-6857.patch


 The IDEA dev-tools configuration doesn't build in IDEA after running ant idea 
 because the following modules are missing a dependency to analysis-common 
 module:
 * velocity
 * extraction
 * map-reduce
 * dataimporthandler-extras
 To reproduce, run ant clean-idea followed by ant idea. Open the project in 
 IDEA, configure the JDK, and make the project. The modules listed above will 
 fail with an error finding org.apache.lucene.analysis.util.ResourceLoader. 
 Adding analysis-common as a module dependency fixes this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6859) Disable REBALANCELEADERS for 5.0

2014-12-17 Thread Erick Erickson (JIRA)

Erick Erickson created SOLR-6859:


 Summary: Disable REBALANCELEADERS for 5.0
 Key: SOLR-6859
 URL: https://issues.apache.org/jira/browse/SOLR-6859
 Project: Solr
  Issue Type: Bug
Reporter: Erick Erickson
Assignee: Erick Erickson
Priority: Blocker


This is flat dangerous with it's current implementation and should not get into 
the wild. The (I hope) proper fix is in SOLR-6691. I want to let that code bake 
for a while post 5.0 before committing though. So this will just comment the 
handling of REBALANCELEADERS from the collections API for the time being.

Marked as blocker, but I should be able to take care of this ASAP so it 
shouldn't stand in the way of 5.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6860) Re-enable REBALANCELEADERS for 5.1

2014-12-17 Thread Erick Erickson (JIRA)

Erick Erickson created SOLR-6860:


 Summary: Re-enable REBALANCELEADERS for 5.1
 Key: SOLR-6860
 URL: https://issues.apache.org/jira/browse/SOLR-6860
 Project: Solr
  Issue Type: Improvement
Reporter: Erick Erickson
Assignee: Erick Erickson


The rebalanceleaders command is disabled for 5.0 to allow more baking time. 
This ticket is to re-enable it (just uncomment it in collections api handling) 
and merge SOLR-6691 into 5.1 after 5.0 has been cut.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6127) Improve Solr's exampledocs data

2014-12-17 Thread Varun Thacker (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Thacker updated SOLR-6127:

Attachment: SOLR-6127.patch

Patch does a few things 

1. Removed all current exampledocs file
2. added film.xml film.json film.csv and the license file
3. added the exampledocs_generator.py to dev-tools folder
4. modified the schema.xml appropriately

Now we need to decide whether to rename the techproducts configset to film?

 Improve Solr's exampledocs data
 ---

 Key: SOLR-6127
 URL: https://issues.apache.org/jira/browse/SOLR-6127
 Project: Solr
  Issue Type: Improvement
  Components: documentation, scripts and tools
Reporter: Varun Thacker
Assignee: Erik Hatcher
 Fix For: 5.0, Trunk

 Attachments: LICENSE.txt, README.txt, README.txt, SOLR-6127.patch, 
 film.csv, film.json, film.xml, freebase_film_dump.py, freebase_film_dump.py, 
 freebase_film_dump.py, freebase_film_dump.py, freebase_film_dump.py, 
 freebase_film_dump.py, freebase_film_dump.py


 Currently 
 - The CSV example has 10 documents.
 - The JSON example has 4 documents.
 - The XML example has 32 documents.
 1. We should have equal number of documents and the same documents in all the 
 example formats
 2. A data set which is slightly more comprehensive.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


 [ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5951:
---
Attachment: LUCENE-5951.patch

New patch, renaming to spins, and also unwrapping FileSwitchDir, and 
returning false for RAMDirectory.

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6787) API to manage blobs in Solr


 [ 
https://issues.apache.org/jira/browse/SOLR-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6787:
-
Description: 
A special collection called .system needs to be created by the user to 
store/manage blobs. The schema/solrconfig of that collection need to be 
automatically supplied by the system so that there are no errors

APIs need to be created to manage the content of that collection


{code}
#create your .system collection first
http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2

The config for this collection is automatically created . numShards for this 
collection is hardcoded to 1

#create a new jar or add a new version of a jar

curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
@mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent

#  GET on the end point would give a list of jars and other details
curl http://localhost:8983/solr/.system/blob 
# GET on the end point with jar name would give  details of various versions of 
the available jars
curl http://localhost:8983/solr/.system/blob/mycomponent
# GET on the end point with jar name and version with a wt=filestream to get 
the actual file
curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream  
mycomponent.1.jar

# GET on the end point with jar name and wt=filestream to get the latest 
version of the file
curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream  
mycomponent.jar
{code}

Please note that the jars are never deleted. a new version is added to the 
system everytime a new jar is posted for the name. You must use the standard 
delete commands to delete the old entries

  was:
A special collection called .system needs to be created by the user to 
store/manage blobs. The schema/solrconfig of that collection need to be 
automatically supplied by the system so that there are no errors

APIs need to be created to manage the content of that collection


{code}
#create a new jar or add a new version of a jar

curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
@mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent

#  GET on the end point would give a list of jars and other details
curl http://localhost:8983/solr/.system/blob 
# GET on the end point with jar name would give  details of various versions of 
the available jars
curl http://localhost:8983/solr/.system/blob/mycomponent
# GET on the end point with jar name and version with a wt=filestream to get 
the actual file
curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream  
mycomponent.1.jar

# GET on the end point with jar name and wt=filestream to get the latest 
version of the file
curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream  
mycomponent.jar
{code}

Please note that the jars are never deleted. a new version is added to the 
system everytime a new jar is posted for the name. You must use the standard 
delete commands to delete the old entries


 API to manage blobs in  Solr
 

 Key: SOLR-6787
 URL: https://issues.apache.org/jira/browse/SOLR-6787
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0, Trunk

 Attachments: SOLR-6787.patch, SOLR-6787.patch


 A special collection called .system needs to be created by the user to 
 store/manage blobs. The schema/solrconfig of that collection need to be 
 automatically supplied by the system so that there are no errors
 APIs need to be created to manage the content of that collection
 {code}
 #create your .system collection first
 http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2
 The config for this collection is automatically created . numShards for this 
 collection is hardcoded to 1
 #create a new jar or add a new version of a jar
 curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
 @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent
 #  GET on the end point would give a list of jars and other details
 curl http://localhost:8983/solr/.system/blob 
 # GET on the end point with jar name would give  details of various versions 
 of the available jars
 curl http://localhost:8983/solr/.system/blob/mycomponent
 # GET on the end point with jar name and version with a wt=filestream to get 
 the actual file
 curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream  
 mycomponent.1.jar
 # GET on the end point with jar name and wt=filestream to get the latest 
 version of the file
 curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream  
 mycomponent.jar
 {code}
 Please note that the jars are never deleted. a new version is added to the 
 system everytime a new jar is posted for the name. You must use

[jira] [Updated] (SOLR-6787) API to manage blobs in Solr


 [ 
https://issues.apache.org/jira/browse/SOLR-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-6787:
-
Description: 
A special collection called .system needs to be created by the user to 
store/manage blobs. The schema/solrconfig of that collection need to be 
automatically supplied by the system so that there are no errors

APIs need to be created to manage the content of that collection


{code}
#create your .system collection first
http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2
#The config for this collection is automatically created . numShards for this 
collection is hardcoded to 1

#create a new jar or add a new version of a jar

curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
@mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent

#  GET on the end point would give a list of jars and other details
curl http://localhost:8983/solr/.system/blob 
# GET on the end point with jar name would give  details of various versions of 
the available jars
curl http://localhost:8983/solr/.system/blob/mycomponent
# GET on the end point with jar name and version with a wt=filestream to get 
the actual file
curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream  
mycomponent.1.jar

# GET on the end point with jar name and wt=filestream to get the latest 
version of the file
curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream  
mycomponent.jar
{code}

Please note that the jars are never deleted. a new version is added to the 
system everytime a new jar is posted for the name. You must use the standard 
delete commands to delete the old entries

  was:
A special collection called .system needs to be created by the user to 
store/manage blobs. The schema/solrconfig of that collection need to be 
automatically supplied by the system so that there are no errors

APIs need to be created to manage the content of that collection


{code}
#create your .system collection first
http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2

The config for this collection is automatically created . numShards for this 
collection is hardcoded to 1

#create a new jar or add a new version of a jar

curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
@mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent

#  GET on the end point would give a list of jars and other details
curl http://localhost:8983/solr/.system/blob 
# GET on the end point with jar name would give  details of various versions of 
the available jars
curl http://localhost:8983/solr/.system/blob/mycomponent
# GET on the end point with jar name and version with a wt=filestream to get 
the actual file
curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream  
mycomponent.1.jar

# GET on the end point with jar name and wt=filestream to get the latest 
version of the file
curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream  
mycomponent.jar
{code}

Please note that the jars are never deleted. a new version is added to the 
system everytime a new jar is posted for the name. You must use the standard 
delete commands to delete the old entries


 API to manage blobs in  Solr
 

 Key: SOLR-6787
 URL: https://issues.apache.org/jira/browse/SOLR-6787
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0, Trunk

 Attachments: SOLR-6787.patch, SOLR-6787.patch


 A special collection called .system needs to be created by the user to 
 store/manage blobs. The schema/solrconfig of that collection need to be 
 automatically supplied by the system so that there are no errors
 APIs need to be created to manage the content of that collection
 {code}
 #create your .system collection first
 http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2
 #The config for this collection is automatically created . numShards for this 
 collection is hardcoded to 1
 #create a new jar or add a new version of a jar
 curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
 @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent
 #  GET on the end point would give a list of jars and other details
 curl http://localhost:8983/solr/.system/blob 
 # GET on the end point with jar name would give  details of various versions 
 of the available jars
 curl http://localhost:8983/solr/.system/blob/mycomponent
 # GET on the end point with jar name and version with a wt=filestream to get 
 the actual file
 curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream  
 mycomponent.1.jar
 # GET on the end point with jar name and wt=filestream to get the latest 
 version of the file
 curl

[jira] [Commented] (SOLR-6850) AutoAddReplicas does not wait enough for a replica to get live


[ 
https://issues.apache.org/jira/browse/SOLR-6850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250057#comment-14250057
 ] 

Mark Miller commented on SOLR-6850:
---

Good catch Varun! I just took a look and this is actually fixed in Cloudera 
Search - whoops. I'll sync up and see if there is any other changes I have that 
are missing after committing this.

 AutoAddReplicas does not wait enough for a replica to get live
 --

 Key: SOLR-6850
 URL: https://issues.apache.org/jira/browse/SOLR-6850
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10, 4.10.1, 4.10.2, 5.0, Trunk
Reporter: Varun Thacker
 Attachments: SOLR-6850.patch, SOLR-6850.patch


 After we have detected that a replica needs failing over, we add a replica 
 and wait to see if it's live.
 Currently we only wait for 30ms , but I think the intention here was to wait 
 for 30s.
 In CloudStateUtil.waitToSeeLive() the conversion should have been 
 {{System.nanoTime() + TimeUnit.NANOSECONDS.convert(timeoutInMs, 
 TimeUnit.SECONDS);}} instead of {{System.nanoTime() + 
 TimeUnit.NANOSECONDS.convert(timeoutInMs, TimeUnit.MILLISECONDS);}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

2014-12-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250059#comment-14250059
 ] 

Robert Muir commented on LUCENE-5951:
-

Thanks, i will take another crack at FSDir logic. we should be able to handle 
tmpfs etc better here (likely on mac, too).

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6559) Create an endpoint /update/xml/docs endpoint to do custom xml indexing

2014-12-17 Thread Anurag Sharma (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anurag Sharma updated SOLR-6559:

Attachment: SOLR-6559.patch

Attaching patch that can be applied on latest trunk. 
The XPathRecordReader doesn't support wild card. Either we have to implement 
the wildcard functionality or use another XPath parser. 
Also added a unit test (testSupportedWildCard) demonstrating the capability is 
unsupported. Also the patch has positive unit tests which are working.

 Create an endpoint /update/xml/docs endpoint to do custom xml indexing
 --

 Key: SOLR-6559
 URL: https://issues.apache.org/jira/browse/SOLR-6559
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch, 
 SOLR-6559.patch


 Just the way we have an json end point create an xml end point too. use the 
 XPathRecordReader in DIH to do the same . The syntax would require slight 
 tweaking to match the params of /update/json/docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box


[ 
https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250067#comment-14250067
 ] 

ASF subversion and git services commented on LUCENE-6117:
-

Commit 1646288 from [~mikemccand] in branch 'dev/trunk'
[ https://svn.apache.org/r1646288 ]

LUCENE-6117: this test secretly relies on testPoint too

 infostream is currently unusable out of box
 ---

 Key: LUCENE-6117
 URL: https://issues.apache.org/jira/browse/LUCENE-6117
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6117.patch


 testpoints used to only be emitted by assertions (still sketchy), but now are 
 emitted always. I assume this is due to the change to support running tests 
 with assertions disabled.
 we should try to clean this up, simple stuff like this is now useless:
 {code}
 indexWriterConfig.setInfoStream(System.out);
 // causes massive flooding like this:
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 {code}
 I hit this several times today just trying to do benchmarks and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6117) infostream is currently unusable out of box

2014-12-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250069#comment-14250069
 ] 

ASF subversion and git services commented on LUCENE-6117:
-

Commit 1646289 from [~mikemccand] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1646289 ]

LUCENE-6117: this test secretly relies on testPoint too

 infostream is currently unusable out of box
 ---

 Key: LUCENE-6117
 URL: https://issues.apache.org/jira/browse/LUCENE-6117
 Project: Lucene - Core
  Issue Type: Bug
Reporter: Robert Muir
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6117.patch


 testpoints used to only be emitted by assertions (still sketchy), but now are 
 emitted always. I assume this is due to the change to support running tests 
 with assertions disabled.
 we should try to clean this up, simple stuff like this is now useless:
 {code}
 indexWriterConfig.setInfoStream(System.out);
 // causes massive flooding like this:
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 // TP 0 [Tue Dec 16 20:19:37 EST 2014; Thread-0]: DocumentsWriterPerThread 
 addDocument start
 {code}
 I hit this several times today just trying to do benchmarks and debugging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6554) Speed up overseer operations for collections with stateFormat 1


 [ 
https://issues.apache.org/jira/browse/SOLR-6554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-6554:

Attachment: SOLR-6554-workqueue-fixes.patch

Slightly refactored. All tests pass.

 Speed up overseer operations for collections with stateFormat  1
 -

 Key: SOLR-6554
 URL: https://issues.apache.org/jira/browse/SOLR-6554
 Project: Solr
  Issue Type: Improvement
  Components: SolrCloud
Affects Versions: 5.0, Trunk
Reporter: Shalin Shekhar Mangar
 Attachments: SOLR-6554-batching-refactor.patch, 
 SOLR-6554-batching-refactor.patch, SOLR-6554-batching-refactor.patch, 
 SOLR-6554-batching-refactor.patch, SOLR-6554-workqueue-fixes.patch, 
 SOLR-6554-workqueue-fixes.patch, SOLR-6554.patch, SOLR-6554.patch, 
 SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, SOLR-6554.patch, 
 SOLR-6554.patch, SOLR-6554.patch


 Right now (after SOLR-5473 was committed), a node watches a collection only 
 if stateFormat=1 or if that node hosts at least one core belonging to that 
 collection.
 This means that a node which is the overseer operates on all collections but 
 watches only a few. So any read goes directly to zookeeper which slows down 
 overseer operations.
 Let's have the overseer node watch all collections always and never remove 
 those watches (except when the collection itself is deleted).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6120) how should MockIndexOutputWrapper.close handle exceptions in delegate.close


[ 
https://issues.apache.org/jira/browse/LUCENE-6120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250079#comment-14250079
 ] 

Robert Muir commented on LUCENE-6120:
-

this is a test bug, but there are more bugs here in addition to this one.

if close() is called multiple times, then disk usage computation and internal 
ref counting (trace through removeIndexOutput()) is wrong. that violates 
Closeable.close()


 how should MockIndexOutputWrapper.close handle exceptions in delegate.close
 ---

 Key: LUCENE-6120
 URL: https://issues.apache.org/jira/browse/LUCENE-6120
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/store
Reporter: Michael McCandless
Priority: Minor

 Chasing a tricking Elasticsearch test failure, it came down to the 
 delegate.close throwing an exception (ClosedByInterruptException, 
 disturbingly, in this case), causing MockIndexOutputWrapper.close to fail to 
 remove that IO from MDW's map.
 The question is, what should we do here, when delegate.close throws an 
 exception?  Is the delegate in fact closed, even when it throws an exception? 
  
 Java8's docs on java.io.Closeable say this:
 As noted in AutoCloseable.close(), cases where the close may fail require 
 careful attention. It is strongly advised to relinquish the underlying 
 resources and to internally mark the Closeable as closed, prior to throwing 
 the IOException.
 And our OutputStreamIndexOutput is careful about this (flushes, then closes 
 in a try-with-resources).
 So, I think MDW should be fixed to mark the IO as closed even if 
 delegate.close throws an exception...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-5.x #790: POMs out of sync

2014-12-17 Thread Apache Jenkins Server

Build: https://builds.apache.org/job/Lucene-Solr-Maven-5.x/790/

3 tests failed.
FAILED:  
org.apache.solr.hadoop.MorphlineMapperTest.org.apache.solr.hadoop.MorphlineMapperTest

Error Message:
null

Stack Trace:
java.lang.AssertionError: null
at __randomizedtesting.SeedInfo.seed([682F0CE7F59E9F9B]:0)
at 
org.apache.lucene.util.TestRuleTemporaryFilesCleanup.before(TestRuleTemporaryFilesCleanup.java:105)
at 
com.carrotsearch.randomizedtesting.rules.TestRuleAdapter$1.before(TestRuleAdapter.java:26)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:35)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:54)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Thread.java:745)


FAILED:  org.apache.solr.hadoop.MorphlineBasicMiniMRTest.testPathParts

Error Message:
Test abandoned because suite timeout was reached.

Stack Trace:
java.lang.Exception: Test abandoned because suite timeout was reached.
at __randomizedtesting.SeedInfo.seed([197EC495E889594E]:0)


FAILED:  
org.apache.solr.hadoop.MorphlineBasicMiniMRTest.org.apache.solr.hadoop.MorphlineBasicMiniMRTest

Error Message:
Suite timeout exceeded (= 720 msec).

Stack Trace:
java.lang.Exception: Suite timeout exceeded (= 720 msec).
at __randomizedtesting.SeedInfo.seed([197EC495E889594E]:0)




Build Log:
[...truncated 53791 lines...]
BUILD FAILED
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:552: 
The following error occurred while executing this line:
/usr/home/jenkins/jenkins-slave/workspace/Lucene-Solr-Maven-5.x/build.xml:204: 
The following error occurred while executing this line:
: Java returned: 1

Total time: 382 minutes 43 seconds
Build step 'Invoke Ant' marked build as failure
Recording test results
Email was triggered for: Failure
Sending email for trigger: Failure



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone

Anshum Gupta created SOLR-6861:
--

 Summary: Remove example/exampledocs/post.sh as the concept of 
default update URL is almost gone
 Key: SOLR-6861
 URL: https://issues.apache.org/jira/browse/SOLR-6861
 Project: Solr
  Issue Type: Task
Reporter: Anshum Gupta
Assignee: Anshum Gupta


We should remove post.sh and replace it with bin/post (SOLR-6435).
post.sh right now has a hardcoded single core update URL i.e.
http://localhost:8983/solr/update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6559) Create an endpoint /update/xml/docs endpoint to do custom xml indexing

2014-12-17 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250147#comment-14250147
 ] 

Noble Paul commented on SOLR-6559:
--

bq.The XPathRecordReader doesn't support wild card

it does . Look at the tests 

 Create an endpoint /update/xml/docs endpoint to do custom xml indexing
 --

 Key: SOLR-6559
 URL: https://issues.apache.org/jira/browse/SOLR-6559
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch, 
 SOLR-6559.patch


 Just the way we have an json end point create an xml end point too. use the 
 XPathRecordReader in DIH to do the same . The syntax would require slight 
 tweaking to match the params of /update/json/docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4492) Please add support for Collection API CREATE method to evenly distribute leader roles among instances

2014-12-17 Thread Tim Vaillancourt (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250149#comment-14250149
 ] 

Tim Vaillancourt commented on SOLR-4492:


Thanks Erick!

 Please add support for Collection API CREATE method to evenly distribute 
 leader roles among instances
 -

 Key: SOLR-4492
 URL: https://issues.apache.org/jira/browse/SOLR-4492
 Project: Solr
  Issue Type: New Feature
  Components: SolrCloud
Reporter: Tim Vaillancourt
Assignee: Erick Erickson
Priority: Minor
 Fix For: 5.0, Trunk


 Currently in SolrCloud 4.1, a CREATE call to the Collection API will cause 
 the server receiving the CREATE call to become the leader of all shards.
 I would like to ask for the ability for the CREATE call to evenly distribute 
 the leader role across all instances, ie: if I create 3 shards over 3 SOLR 
 4.1 instances, each instance/node would only be the leader of 1 shard.
 This would be logically consistent with the way replicas are randomly 
 distributed by this same call across instances/nodes.
 Currently, this CREATE call will cause the server receiving the call to 
 become the leader of 3 shards.
 curl -v 
 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2'
 PS: Thank you SOLR developers for your contributions!
 Tim Vaillancourt



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone


[ 
https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250170#comment-14250170
 ] 

ASF subversion and git services commented on SOLR-6861:
---

Commit 1646297 from [~anshumg] in branch 'dev/trunk'
[ https://svn.apache.org/r1646297 ]

SOLR-6861: Remove post.sh from exampledocs

 Remove example/exampledocs/post.sh as the concept of default update URL is 
 almost gone
 --

 Key: SOLR-6861
 URL: https://issues.apache.org/jira/browse/SOLR-6861
 Project: Solr
  Issue Type: Task
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 We should remove post.sh and replace it with bin/post (SOLR-6435).
 post.sh right now has a hardcoded single core update URL i.e.
 http://localhost:8983/solr/update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone

2014-12-17 Thread ASF subversion and git services (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta resolved SOLR-6861.

Resolution: Fixed

 Remove example/exampledocs/post.sh as the concept of default update URL is 
 almost gone
 --

 Key: SOLR-6861
 URL: https://issues.apache.org/jira/browse/SOLR-6861
 Project: Solr
  Issue Type: Task
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 We should remove post.sh and replace it with bin/post (SOLR-6435).
 post.sh right now has a hardcoded single core update URL i.e.
 http://localhost:8983/solr/update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone


[ 
https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250174#comment-14250174
 ] 

ASF subversion and git services commented on SOLR-6861:
---

Commit 1646298 from [~anshumg] in branch 'dev/branches/branch_5x'
[ https://svn.apache.org/r1646298 ]

SOLR-6861: Remove post.sh from exampledocs (Merge from trunk)

 Remove example/exampledocs/post.sh as the concept of default update URL is 
 almost gone
 --

 Key: SOLR-6861
 URL: https://issues.apache.org/jira/browse/SOLR-6861
 Project: Solr
  Issue Type: Task
Reporter: Anshum Gupta
Assignee: Anshum Gupta

 We should remove post.sh and replace it with bin/post (SOLR-6435).
 post.sh right now has a hardcoded single core update URL i.e.
 http://localhost:8983/solr/update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-6852) SimplePostTool should no longer default to collection1


 [ 
https://issues.apache.org/jira/browse/SOLR-6852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta resolved SOLR-6852.

Resolution: Fixed

 SimplePostTool should no longer default to collection1
 --

 Key: SOLR-6852
 URL: https://issues.apache.org/jira/browse/SOLR-6852
 Project: Solr
  Issue Type: Improvement
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.0

 Attachments: SOLR-6852.patch, SOLR-6852.patch


 Solr no longer would be bootstrapped with collection1 and so it no longer 
 makes sense for the SimplePostTool to default to collection1 either.
 Without an explicit collection/core/url value, the call should just fail fast.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6861) Remove example/exampledocs/post.sh as the concept of default update URL is almost gone


 [ 
https://issues.apache.org/jira/browse/SOLR-6861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anshum Gupta updated SOLR-6861:
---
Fix Version/s: Trunk
   5.0

 Remove example/exampledocs/post.sh as the concept of default update URL is 
 almost gone
 --

 Key: SOLR-6861
 URL: https://issues.apache.org/jira/browse/SOLR-6861
 Project: Solr
  Issue Type: Task
Reporter: Anshum Gupta
Assignee: Anshum Gupta
 Fix For: 5.0, Trunk


 We should remove post.sh and replace it with bin/post (SOLR-6435).
 post.sh right now has a hardcoded single core update URL i.e.
 http://localhost:8983/solr/update



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

2014-12-17 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250219#comment-14250219
 ] 

Hoss Man commented on LUCENE-5951:
--

{noformat}
+  public static int AUTO_DETECT_MERGES_AND_THREADS = -1;
{noformat}
...that's suppose to be a final (sentinel value) correct? nothing should be 
allowed modify it at run time?

{noformat}
+  public synchronized void setMaxMergesAndThreads(int maxMergeCount, int 
maxThreadCount) {
+if (maxMergeCount == AUTO_DETECT_MERGES_AND_THREADS  maxThreadCount == 
AUTO_DETECT_MERGES_AND_THREADS) {
+  // OK
+  maxMergeCount = AUTO_DETECT_MERGES_AND_THREADS;
+  maxThreadCount = AUTO_DETECT_MERGES_AND_THREADS;
{noformat}

...is that suppose to be setting this.maxMergeCount and this.maxThreadCount ? 
... it looks like it it's just a No-Op (and this.maxMergeCount and 
this.maxThreadCount never get set in this case?)

{noformat}
+  public static boolean spins(Path path) throws IOException {
{noformat}

...is it worth using a terinary enum (or nullable Boolean) here to track the 
diff between:
* confident it's a spinning disk
* confident it's not a spinning disk
* unknown what type of storage this is

...that way we can make the default behavior of CMS conservative, and only be 
aggressive if we are confident it's not-spinning; but app devs can be more 
aggressive -- call the same spins() utility and only use conservative values if 
they are confident it's a spinning disk, otherwise call setMaxMergesAndThreads 
with higher values.


 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Reopened] (LUCENE-6019) IndexWriter allows to add same field with different docvlaues type


 [ 
https://issues.apache.org/jira/browse/LUCENE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reopened LUCENE-6019:

  Assignee: Michael McCandless

This commit caused LUCENE-6117, which Rob found  fixed (thanks!), but is too 
big to backport to 4.10.x.

I think to fix it, I should revert the -Dtests.asserts part of this change (but 
keep the original bug fix).

 IndexWriter allows to add same field with different docvlaues type 
 ---

 Key: LUCENE-6019
 URL: https://issues.apache.org/jira/browse/LUCENE-6019
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.10.1
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Critical
 Fix For: 4.10.2, 5.0, Trunk

 Attachments: LUCENE-6019.patch, LUCENE-6019.patch


 IndexWriter checks if the DV types are consitent in multiple places but if 
 due to some problems in Elasticsearch users where able to add the same field 
 with different DV types causing merges to fail. Yet I was able to reduce this 
 to a lucene testcase but I was puzzled since it always failed. Yet, I had to 
 run it without assertions and that cause the bug to happen. I can add field 
 foo with BINARY and SORTED_SET causing a merge to fail. Here is a gist 
 https://gist.github.com/s1monw/8707f924b76ba40ee5f3 / 
 https://github.com/elasticsearch/elasticsearch/issues/8009 
 While this is certainly a problem in Elasticsearch Lucene also allows to 
 corrupt an index due to user error which I think should be prevented. NOTE: 
 this only fails if you run without assertions which I think lucene should do 
 in CI once in a while too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6019) IndexWriter allows to add same field with different docvlaues type


[ 
https://issues.apache.org/jira/browse/LUCENE-6019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250237#comment-14250237
 ] 

Robert Muir commented on LUCENE-6019:
-

+1, i would rather not cause instability or false failures in the bugfix branch.

 IndexWriter allows to add same field with different docvlaues type 
 ---

 Key: LUCENE-6019
 URL: https://issues.apache.org/jira/browse/LUCENE-6019
 Project: Lucene - Core
  Issue Type: Bug
  Components: core/index
Affects Versions: 4.10.1
Reporter: Simon Willnauer
Assignee: Michael McCandless
Priority: Critical
 Fix For: 4.10.2, 5.0, Trunk

 Attachments: LUCENE-6019.patch, LUCENE-6019.patch


 IndexWriter checks if the DV types are consitent in multiple places but if 
 due to some problems in Elasticsearch users where able to add the same field 
 with different DV types causing merges to fail. Yet I was able to reduce this 
 to a lucene testcase but I was puzzled since it always failed. Yet, I had to 
 run it without assertions and that cause the bug to happen. I can add field 
 foo with BINARY and SORTED_SET causing a merge to fail. Here is a gist 
 https://gist.github.com/s1monw/8707f924b76ba40ee5f3 / 
 https://github.com/elasticsearch/elasticsearch/issues/8009 
 While this is certainly a problem in Elasticsearch Lucene also allows to 
 corrupt an index due to user error which I think should be prevented. NOTE: 
 this only fails if you run without assertions which I think lucene should do 
 in CI once in a while too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250241#comment-14250241
 ] 

Robert Muir commented on LUCENE-5951:
-

I dont think we should make things complicated for app developers. We are not 
writing a generic spins() method for developers, its a lucene.internal method 
for good defaults.

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250243#comment-14250243
 ] 

Michael McCandless commented on LUCENE-5951:


bq. ...that's suppose to be a final (sentinel value) correct? nothing should be 
allowed modify it at run time?

Whoa, nice catch!  I'll fix.

bq. . it looks like it it's just a No-Op (

Gak, good catch :)  I'll add a test that exposes this then fix it.

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-6851) oom_solr.sh problems

2014-12-17 Thread Timothy Potter (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-6851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Potter reassigned SOLR-6851:


Assignee: Timothy Potter

 oom_solr.sh problems
 

 Key: SOLR-6851
 URL: https://issues.apache.org/jira/browse/SOLR-6851
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
Assignee: Timothy Potter
 Fix For: 5.0


 noticed 2 problems with the oom_solr.sh script...
 1) the script is only being run with the port of hte solr instance to 
 terminate, so the log messages aren't getting writen to the correct directory 
 -- if we change hte script to take a log dir/file as an argument, we can 
 ensure the logs are written to the correct place
 2) on my ubuntu linux machine (where /bin/sh is a symlink to /bin/dash), 
 the console log is recording a script error when java runs oom_solr.sh...
 {noformat}
 #
 # java.lang.OutOfMemoryError: Java heap space
 # -XX:OnOutOfMemoryError=/home/hossman/lucene/5x_dev/solr/bin/oom_solr.sh 
 8983
 #   Executing /bin/sh -c /home/hossman/lucene/5x_dev/solr/bin/oom_solr.sh 
 8983...
 /home/hossman/lucene/5x_dev/solr/bin/oom_solr.sh: 20: [: 14305: unexpected 
 operator
 Running OOM killer script for process 14305 for Solr on port 8983
 Killed process 14305
 {noformat}
 steps to reproduce: {{bin/solr -e techproducts -m 10m}}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6559) Create an endpoint /update/xml/docs endpoint to do custom xml indexing

2014-12-17 Thread Anurag Sharma (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250268#comment-14250268
 ] 

Anurag Sharma commented on SOLR-6559:
-

Looked for wildcard '*' couldn't find any unit test in TestXPathRecordReader 

 Create an endpoint /update/xml/docs endpoint to do custom xml indexing
 --

 Key: SOLR-6559
 URL: https://issues.apache.org/jira/browse/SOLR-6559
 Project: Solr
  Issue Type: Bug
Reporter: Noble Paul
Assignee: Noble Paul
 Attachments: SOLR-6559.patch, SOLR-6559.patch, SOLR-6559.patch, 
 SOLR-6559.patch


 Just the way we have an json end point create an xml end point too. use the 
 XPathRecordReader in DIH to do the same . The syntax would require slight 
 tweaking to match the params of /update/json/docs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


 [ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-5951:
---
Attachment: LUCENE-5951.patch

New patch fixing Hoss's issues (thanks!).

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, 
 LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted

Jason Wang created SOLR-6862:


 Summary: full data import from jdbc datasource with connection 
failed problem, after rollback all the previous indexed data deleted
 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


 [ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-5951:

Attachment: LUCENE-5951.patch

I cleaned up the code to remove the hashmap, not try to lookup 'rotational' for 
obviously bogus names (like nfs), return false for tmpfs, etc.

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, 
 LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

2014-12-17 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250327#comment-14250327
 ] 

Hoss Man commented on LUCENE-5951:
--

{noformat}
+for (FileStore store : FileSystems.getDefault().getFileStores()) {
+  String desc = store.toString();
+  int start = desc.lastIndexOf('(');
+  int end = desc.indexOf(')', start);
+  mountToDevice.put(desc.substring(0, start-1), desc.substring(start+1, 
end));
+}
{noformat}

...I don't see anything in the javadocs for FileStore making any guarantees 
about the toString -- so the results of these lastIndexOf and indexOf calls 
should probably have bounds checks to prevent IOOBE from substring. (either 
that or just catch the IOOBE and give up)

{noformat}
+if (!devName.isEmpty()  
Character.isDigit(devName.charAt(devName.length()-1))) {
+  devName = devName.substring(0, devName.length()-1);
{noformat}

...what about people with lots of partitions?  ie: /dev/sda42

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, 
 LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250332#comment-14250332
 ] 

Robert Muir commented on LUCENE-5951:
-

{quote}
...I don't see anything in the javadocs for FileStore making any guarantees 
about the toString – so the results of these lastIndexOf and indexOf calls 
should probably have bounds checks to prevent IOOBE from substring. (either 
that or just catch the IOOBE and give up)
{quote}

Maybe you missed the try-catch when looking at the patch. 

{code}
} catch (Exception ioe) {
  // our crazy heuristics can easily trigger SecurityException, AIOOBE, etc ...
  return true;
}
{code}

{quote}
...what about people with lots of partitions? ie: /dev/sda42
{quote}

Maybe if you quoted more of the context, you would see this was in a loop?

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, 
 LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

2014-12-17 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250333#comment-14250333
 ] 

Uwe Schindler commented on LUCENE-5951:
---

+1
The heavy funny heuristics method is a masterpiece of coding in contrast to 
Hadoop's detection. I am so happy that it does not span df or mount 
commands! Many thanks :-) Java 7 is cool!

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, 
 LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

2014-12-17 Thread Uwe Schindler (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250333#comment-14250333
 ] 

Uwe Schindler edited comment on LUCENE-5951 at 12/17/14 7:12 PM:
-

+1
The heavy funny heuristics method is a masterpiece of coding in contrast to 
Hadoop's detection. I am so happy that it does not exec df or mount 
commands! Many thanks :-) Java 7 is cool!


was (Author: thetaphi):
+1
The heavy funny heuristics method is a masterpiece of coding in contrast to 
Hadoop's detection. I am so happy that it does not span df or mount 
commands! Many thanks :-) Java 7 is cool!

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, 
 LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6845) figure out why suggester causes slow startup - even when not used

2014-12-17 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-6845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250350#comment-14250350
 ] 

Tomás Fernández Löbbe commented on SOLR-6845:
-

Trying to add some unit tests to this feature I found another issue. 
SuggestComponent and SpellcheckComponent rely on a {{firstSearcherListener}} to 
load (and in this case, also build) some structures. These 
firstSearcherListeners are registered on {{SolrCoreAware.inform()}}, however 
the first searcher listener task is only added to the queue of warming tasks if 
there is at least one listener registered at the time of the first searcher 
creation (before SolrCoreAware.inform() is ever called). See 
{code:title=SolrCore.java}
if (currSearcher == null  firstSearcherListeners.size()  0) {
  future = searcherExecutor.submit(new Callable() {
@Override
public Object call() throws Exception {
  try {
for (SolrEventListener listener : firstSearcherListeners) {
  listener.newSearcher(newSearcher, null);
}
  } catch (Throwable e) {
SolrException.log(log, null, e);
if (e instanceof Error) {
  throw (Error) e;
}
  }
  return null;
}
  });
}
{code}
I'll create a new Jira for this

 figure out why suggester causes slow startup - even when not used
 -

 Key: SOLR-6845
 URL: https://issues.apache.org/jira/browse/SOLR-6845
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man

 SOLR-6679 was filed to track the investigation into the following problem...
 {panel}
 The stock solrconfig provides a bad experience with a large index... start up 
 Solr and it will spin at 100% CPU for minutes, unresponsive, while it 
 apparently builds a suggester index.
 ...
 This is what I did:
 1) indexed 10M very small docs (only takes a few minutes).
 2) shut down Solr
 3) start up Solr and watch it be unresponsive for over 4 minutes!
 I didn't even use any of the fields specified in the suggester config and I 
 never called the suggest request handler.
 {panel}
 ..but ultimately focused on removing/disabling the suggester from the sample 
 configs.
 Opening this new issue to focus on actually trying to identify the root 
 problem  fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6787) API to manage blobs in Solr

2014-12-17 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250397#comment-14250397
 ] 

Yonik Seeley commented on SOLR-6787:


bq. These are not really special APIs.
I was responding to this:  APIs need to be created to manage the content of 
that collection
And I was wondering since binary field and blob seem synonymous, why there 
would be a separate/different API to get/set the value of such a field.

bq. All the handlers loaded from .system will be automatically be 
startup=lazy .

But request handlers are one of the only things that have support for lazy.
What's the plan to support custom SearchComponents, Update processors, 
QParsers, or ValueSourceParsers (all of those are very common)?

Also, a big question is persistence.  What happens when you add a request 
handler via API, and then the server is bounced?

bq. We are rethinking the way Solr is being used. 

That's great, but please do so in public forums so everyone can participate in 
the discussion.


 API to manage blobs in  Solr
 

 Key: SOLR-6787
 URL: https://issues.apache.org/jira/browse/SOLR-6787
 Project: Solr
  Issue Type: Sub-task
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 5.0, Trunk

 Attachments: SOLR-6787.patch, SOLR-6787.patch


 A special collection called .system needs to be created by the user to 
 store/manage blobs. The schema/solrconfig of that collection need to be 
 automatically supplied by the system so that there are no errors
 APIs need to be created to manage the content of that collection
 {code}
 #create your .system collection first
 http://localhost:8983/solr/admin/collections?action=CREATEname=.systemreplicationFactor=2
 #The config for this collection is automatically created . numShards for this 
 collection is hardcoded to 1
 #create a new jar or add a new version of a jar
 curl -X POST -H 'Content-Type: application/octet-stream' --data-binary 
 @mycomponent.jar http://localhost:8983/solr/.system/blob/mycomponent
 #  GET on the end point would give a list of jars and other details
 curl http://localhost:8983/solr/.system/blob 
 # GET on the end point with jar name would give  details of various versions 
 of the available jars
 curl http://localhost:8983/solr/.system/blob/mycomponent
 # GET on the end point with jar name and version with a wt=filestream to get 
 the actual file
 curl http://localhost:8983/solr/.system/blob/mycomponent/1?wt=filestream  
 mycomponent.1.jar
 # GET on the end point with jar name and wt=filestream to get the latest 
 version of the file
 curl http://localhost:8983/solr/.system/blob/mycomponent?wt=filestream  
 mycomponent.jar
 {code}
 Please note that the jars are never deleted. a new version is added to the 
 system everytime a new jar is posted for the name. You must use the standard 
 delete commands to delete the old entries



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted

2014-12-17 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250508#comment-14250508
 ] 

Mikhail Khludnev commented on SOLR-6862:


please make sure you don't have autocommit enabled 

 full data import from jdbc datasource with connection failed problem, after 
 rollback all the previous indexed data deleted
 --

 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-6863) We should use finite timeouts when getting http connections from pools.

Mark Miller created SOLR-6863:
-

 Summary: We should use finite timeouts when getting http 
connections from pools.
 Key: SOLR-6863
 URL: https://issues.apache.org/jira/browse/SOLR-6863
 Project: Solr
  Issue Type: Improvement
  Components: rCloud, SolrCloud
Reporter: Mark Miller
Assignee: Mark Miller






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-6863) We should use finite timeouts when getting http connections from pools.


 [ 
https://issues.apache.org/jira/browse/SOLR-6863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-6863:
--
Component/s: (was: rCloud)
 (was: SolrCloud)

 We should use finite timeouts when getting http connections from pools.
 ---

 Key: SOLR-6863
 URL: https://issues.apache.org/jira/browse/SOLR-6863
 Project: Solr
  Issue Type: Improvement
Reporter: Mark Miller
Assignee: Mark Miller





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList

2014-12-17 Thread David Smiley (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-6033:
-
Attachment: LUCENE-6033_boolean_resetInput_option.patch

This patch adds a resetStream constructor option such that fillCache() will 
propagate reset() if this is set.  Very simple. I enhanced the test for this 
and for isCached(). This option goes hand-in-hand with the use of isCached() 
for the use-case I had in mind by allowing you to pass a tokenStream to 
something that might not need it, thereby allowing you to not only toss the 
CachingTokenFilter if it wasn't actually cached, but avoid a redundant reset() 
call on the underlying input.

 Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
 --

 Key: LUCENE-6033
 URL: https://issues.apache.org/jira/browse/LUCENE-6033
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6033.patch, 
 LUCENE-6033_boolean_resetInput_option.patch


 CachingTokenFilter could use a simple boolean isCached() method implemented 
 as-such:
 {code:java}
   /** If the underlying token stream was consumed and cached */
   public boolean isCached() {
 return cache != null;
   }
 {code}
 It's useful for the highlighting code to remove its wrapping of 
 CachingTokenFilter if after handing-off to parts of its framework it turns 
 out that it wasn't used.
 Furthermore, use an ArrayList, not a LinkedList.  ArrayList is leaner when 
 the token count is high, and this class doesn't manipulate the list in a way 
 that might favor LL.
 A separate patch will come that actually uses this method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted


[ 
https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250542#comment-14250542
 ] 

Jason Wang commented on SOLR-6862:
--

Thanks Mikhail,
What should I do to disable autocommit? set commit=false like 
/dataimport?command=full-importclean=truecommit=falsedebug=falseindent=trueverbose=trueoptimize=truewt=json
 or in data-conf.xml datasource definition set autoCommit =false.

Appreciate your help very much.

Thanks again,
Jason


 full data import from jdbc datasource with connection failed problem, after 
 rollback all the previous indexed data deleted
 --

 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted

2014-12-17 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250552#comment-14250552
 ] 

Mikhail Khludnev commented on SOLR-6862:


neither ones, I mean Solr autocommit mentioned at 
http://wiki.apache.org/solr/SolrConfigXml

 full data import from jdbc datasource with connection failed problem, after 
 rollback all the previous indexed data deleted
 --

 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-6033) Add CachingTokenFilter.isCached and switch LinkedList to ArrayList


[ 
https://issues.apache.org/jira/browse/LUCENE-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250595#comment-14250595
 ] 

Robert Muir commented on LUCENE-6033:
-

I dont understand this option, why do we need it? How is it useful to the 
consumers that use CachingTokenFilter (like queryparser). It seems more of an 
abusive case.

 Add CachingTokenFilter.isCached and switch LinkedList to ArrayList
 --

 Key: LUCENE-6033
 URL: https://issues.apache.org/jira/browse/LUCENE-6033
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: David Smiley
Assignee: David Smiley
 Fix For: 5.0, Trunk

 Attachments: LUCENE-6033.patch, 
 LUCENE-6033_boolean_resetInput_option.patch


 CachingTokenFilter could use a simple boolean isCached() method implemented 
 as-such:
 {code:java}
   /** If the underlying token stream was consumed and cached */
   public boolean isCached() {
 return cache != null;
   }
 {code}
 It's useful for the highlighting code to remove its wrapping of 
 CachingTokenFilter if after handing-off to parts of its framework it turns 
 out that it wasn't used.
 Furthermore, use an ArrayList, not a LinkedList.  ArrayList is leaner when 
 the token count is high, and this class doesn't manipulate the list in a way 
 that might favor LL.
 A separate patch will come that actually uses this method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults

2014-12-17 Thread Hoss Man (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250600#comment-14250600
]

Hoss Man commented on LUCENE-5951:
--

bq. Maybe you missed the try-catch when looking at the patch.

that still seems sketchy because it's only in the spins() method ... it's going
to be trappy if/when this code gets refactored and getDeviceName is called from
somewhere else. why not just include some basic exception handling in
getDeviceName as well?

bq. Maybe if you quoted more of the context, you would see this was in a loop?

I did see that, but i didn't realize the purpose was to chomp away at
individual digits in the path until it resolved as a valid file...

too much voodoo for me, i'll shut up now.

Detect when index is on SSD and set dynamic defaults

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted


[ 
https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250601#comment-14250601
 ] 

Jason Wang commented on SOLR-6862:
--

Hi Mikhail,

I commented out autoCommit and autoSoftCommit in updateHandler block in 
solrconfig.xml, it works. 

Is there impact or side effect to do this?

Thank you very much,
Jason

 full data import from jdbc datasource with connection failed problem, after 
 rollback all the previous indexed data deleted
 --

 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted

2014-12-17 Thread Mikhail Khludnev (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250607#comment-14250607
 ] 

Mikhail Khludnev commented on SOLR-6862:


pls check 
https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig#UpdateHandlersinSolrConfig-autoCommit
 and 
http://opensourceconnections.com/blog/2013/04/25/understanding-solr-soft-commits-and-data-durability/
 and close this issue please. Thanks!

 full data import from jdbc datasource with connection failed problem, after 
 rollback all the previous indexed data deleted
 --

 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted


[ 
https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250618#comment-14250618
 ] 

Jason Wang commented on SOLR-6862:
--

Thanks for your quick response and appreciate your help.

 full data import from jdbc datasource with connection failed problem, after 
 rollback all the previous indexed data deleted
 --

 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Closed] (SOLR-6862) full data import from jdbc datasource with connection failed problem, after rollback all the previous indexed data deleted


 [ 
https://issues.apache.org/jira/browse/SOLR-6862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Wang closed SOLR-6862.

Resolution: Not a Problem

This is not a issue.

 full data import from jdbc datasource with connection failed problem, after 
 rollback all the previous indexed data deleted
 --

 Key: SOLR-6862
 URL: https://issues.apache.org/jira/browse/SOLR-6862
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.7.1
Reporter: Jason Wang





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults


[ 
https://issues.apache.org/jira/browse/LUCENE-5951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14250629#comment-14250629
 ] 

Robert Muir commented on LUCENE-5951:
-

The method is private. its not getting called from anywhere else. when an 
exception strikes we *need* it, so that it causes the whole thing to return 
true.  it also has a comment above it '// these are hacks that are not 
guaranteed'.

 Detect when index is on SSD and set dynamic defaults
 

 Key: LUCENE-5951
 URL: https://issues.apache.org/jira/browse/LUCENE-5951
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Michael McCandless
Assignee: Michael McCandless
 Attachments: LUCENE-5951.patch, LUCENE-5951.patch, LUCENE-5951.patch, 
 LUCENE-5951.patch, LUCENE-5951.patch


 E.g. ConcurrentMergeScheduler should default maxMergeThreads to 3 if it's on 
 SSD and 1 if it's on spinning disks.
 I think the new NIO2 APIs can let us figure out which device we are mounted 
 on, and from there maybe we can do os-specific stuff e.g. look at  
 /sys/block/dev/queue/rotational to see if it's spinning storage or not ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-5951) Detect when index is on SSD and set dynamic defaults