[jira] [Comment Edited] (SOLR-7393) HDFS poor indexing performance

2016-06-07 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318760#comment-15318760
 ] 

Hari Sekhon edited comment on SOLR-7393 at 6/7/16 4:06 PM:
---

The difference in write latency was measurably and consistently much higher 
using the code I mentioned above. The throughput when indexing from Hadoop via 
Hive/Pig was much worse too, details also mentioned above.

The only thing I changed in the config was the backend from single local mount 
point to HDFS directory factory (with Kerberos security settings enabled) as I 
was running out of space on single disk (SOLR-7256) and hoped to use the more 
scalable HDFS storage space I had.


was (Author: harisekhon):
The difference in write latency was measurably and consistently much higher 
using the code I mentioned above. The throughput when indexing from Hadoop via 
Hive/Pig was much worse too, details also mentioned above.

The only thing I changed in the config was the backend from single local mount 
point to HDFS directory factory (with Kerberos security settings enabled) as I 
was running out of space on single disk (SOLR-7256) and hoped to use the more 
scalable HDFS storage space I had.

> HDFS poor indexing performance
> --
>
> Key: SOLR-7393
> URL: https://issues.apache.org/jira/browse/SOLR-7393
> Project: Solr
>  Issue Type: Bug
>  Components: Hadoop Integration, hdfs, SolrCloud
>Affects Versions: 4.7.2, 4.10.3
> Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
>Reporter: Hari Sekhon
>Priority: Critical
>
> When switching SolrCloud from local dataDir to HDFS directory factory 
> indexing performance falls through the floor.
> I've also observed very high latency on both QTime and code timer on HDFS 
> writes compares to local dataDir writes (using check_solr_write.pl from 
> https://github.com/harisekhon/nagios-plugins). Single test document write 
> latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 
> on some runs.
> A previous bulk online indexing job from Hive to SolrCloud that took 2 hours 
> for 620M rows ended up taking a projected 20+ hours and never completing, 
> usually breaking around the 16-17 hour timeframe when left overnight.
> It's worth noting that I had to disable the HDFS write cache which was 
> causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells 
> me this doesn't make much performance difference anway.
> This is probably also related to SolrCloud not respecting HDFS replication 
> factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but 
> that solely doesn't account for the massive performance drop going from 
> vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7393) HDFS poor indexing performance

2016-06-07 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318760#comment-15318760
 ] 

Hari Sekhon edited comment on SOLR-7393 at 6/7/16 4:06 PM:
---

The difference in write latency was measurably and consistently much higher 
using the code I mentioned above. The throughput when indexing from Hadoop via 
Hive/Pig was much worse too, details also mentioned above.

The only thing I changed in the config was the backend from single local mount 
point to HDFS directory factory (with Kerberos security settings enabled) as I 
was running out of space on single disk (SOLR-7256) and hoped to use the more 
scalable HDFS storage space I had.


was (Author: harisekhon):
The difference in write latency was measurably and consistently much higher 
using the code I mentioned above. The throughput when indexing from Hadoop via 
Hive/Pig was must much worse too, details also mentioned above.

The only thing I changed in the config was the backend from single local mount 
point to HDFS directory factory (with Kerberos security settings enabled) as I 
was running out of space on single disk (SOLR-7256) and hoped to use the more 
scalable HDFS storage space I had.

> HDFS poor indexing performance
> --
>
> Key: SOLR-7393
> URL: https://issues.apache.org/jira/browse/SOLR-7393
> Project: Solr
>  Issue Type: Bug
>  Components: Hadoop Integration, hdfs, SolrCloud
>Affects Versions: 4.7.2, 4.10.3
> Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
>Reporter: Hari Sekhon
>Priority: Critical
>
> When switching SolrCloud from local dataDir to HDFS directory factory 
> indexing performance falls through the floor.
> I've also observed very high latency on both QTime and code timer on HDFS 
> writes compares to local dataDir writes (using check_solr_write.pl from 
> https://github.com/harisekhon/nagios-plugins). Single test document write 
> latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 
> on some runs.
> A previous bulk online indexing job from Hive to SolrCloud that took 2 hours 
> for 620M rows ended up taking a projected 20+ hours and never completing, 
> usually breaking around the 16-17 hour timeframe when left overnight.
> It's worth noting that I had to disable the HDFS write cache which was 
> causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells 
> me this doesn't make much performance difference anway.
> This is probably also related to SolrCloud not respecting HDFS replication 
> factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but 
> that solely doesn't account for the massive performance drop going from 
> vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7393) HDFS poor indexing performance

2016-06-07 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318760#comment-15318760
 ] 

Hari Sekhon commented on SOLR-7393:
---

The difference in write latency was measurably and consistently much higher 
using the code I mentioned above. The throughput when indexing from Hadoop via 
Hive/Pig was must much worse too, details also mentioned above.

The only thing I changed in the config was the backend from single local mount 
point to HDFS directory factory (with Kerberos security settings enabled) as I 
was running out of space on single disk (SOLR-7256) and hoped to use the more 
scalable HDFS storage space I had.

> HDFS poor indexing performance
> --
>
> Key: SOLR-7393
> URL: https://issues.apache.org/jira/browse/SOLR-7393
> Project: Solr
>  Issue Type: Bug
>  Components: Hadoop Integration, hdfs, SolrCloud
>Affects Versions: 4.7.2, 4.10.3
> Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
>Reporter: Hari Sekhon
>Priority: Critical
>
> When switching SolrCloud from local dataDir to HDFS directory factory 
> indexing performance falls through the floor.
> I've also observed very high latency on both QTime and code timer on HDFS 
> writes compares to local dataDir writes (using check_solr_write.pl from 
> https://github.com/harisekhon/nagios-plugins). Single test document write 
> latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 
> on some runs.
> A previous bulk online indexing job from Hive to SolrCloud that took 2 hours 
> for 620M rows ended up taking a projected 20+ hours and never completing, 
> usually breaking around the 16-17 hour timeframe when left overnight.
> It's worth noting that I had to disable the HDFS write cache which was 
> causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells 
> me this doesn't make much performance difference anway.
> This is probably also related to SolrCloud not respecting HDFS replication 
> factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but 
> that solely doesn't account for the massive performance drop going from 
> vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7256) Multiple data dirs

2016-06-07 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318574#comment-15318574
 ] 

Hari Sekhon commented on SOLR-7256:
---

FYI this was co-located on a Hadoop cluster where Raid would have meant 
destroying the existing hdfs data and making it unsuitable for Hadoop cluster 
node usage and conversely storing the indices on HDFS resulted in severe 
performance degradation, eg. SOLR-7393 - which is why the Elastic.co folks 
never wanted to put their indices on HDFS as they had reported similar 
performances issues.

> Multiple data dirs
> --
>
> Key: SOLR-7256
> URL: https://issues.apache.org/jira/browse/SOLR-7256
> Project: Solr
>  Issue Type: New Feature
>Affects Versions: 4.10.3
> Environment: HDP 2.2 / HDP Search
>Reporter: Hari Sekhon
>
> Request to support multiple dataDirs as indexing a large collection fills up 
> only one of many disks in modern servers (think colocating on Hadoop servers 
> with many disks).
> While HDFS is another alternative, it results in poor performance and index 
> corruption under high online indexing loads (SOLR-7255).
> While it should be possible to do multiple cores with different dataDirs, 
> that could be very difficult to manage and not humanly scale well, so I think 
> Solr should support use of multiple dataDirs natively.
> Regards,
> Hari Sekhon
> http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-9151) solr -e cloud broken if $PWD != $SOLR_HOME on Solr 5.x/6.x

2016-05-23 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-9151:
-

 Summary: solr -e cloud broken if $PWD != $SOLR_HOME on Solr 5.x/6.x
 Key: SOLR-9151
 URL: https://issues.apache.org/jira/browse/SOLR-9151
 Project: Solr
  Issue Type: Bug
Affects Versions: 6.0, 5.5
 Environment: Solr Docker Container
Reporter: Hari Sekhon
Priority: Minor


Solr scripts for cloud example break if called from a directory other than 
$SOLR_HOME, ie $PWD is not $SOLR_HOME: It always strips off the beginning of 
the path. This used to work regardless in Solr 4.x as I used to use it quite a 
lot and my custom solr 4.x docker containers it still works regardless of $PWD 
- it's only broken in 5x/6.0.

Here is an example of the issue:
{code}docker run -ti solr bash
solr@5083b8e59d49:/opt/solr$ cd /
solr@5083b8e59d49:/$ solr -e cloud

Welcome to the SolrCloud example!

This interactive session will help you launch a SolrCloud cluster on your local 
workstation.
To begin, how many Solr nodes would you like to run in your local cluster? 
(specify 1-4 nodes) [2]: 

Ok, let's start up 2 Solr nodes for your example SolrCloud cluster.
Please enter the port for node1 [8983]: 

Please enter the port for node2 [7574]: 

Creating Solr home directory /opt/solr/example/cloud/node1/solr
Cloning /opt/solr/example/cloud/node1 into
   /opt/solr/example/cloud/node2

Starting up Solr on port 8983 using command:
/opt/solr/bin/solr start -cloud -p 8983 -s "pt/solr/example/cloud/node1/solr"

Solr home directory pt/solr/example/cloud/node1/solr not found!

ERROR: Process exited with an error: 1 (Exit value: 1)
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7398) Major imbalance between different shard numDocs in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7398:
--
Summary: Major imbalance between different shard numDocs in SolrCloud on 
HDFS  (was: Major imbalance between different shard doc counts in SolrCloud on 
HDFS)

 Major imbalance between different shard numDocs in SolrCloud on HDFS
 

 Key: SOLR-7398
 URL: https://issues.apache.org/jira/browse/SOLR-7398
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDoc imbalance between shards in a collection such as 
 6k vs 193k docs between the 2 different shards.
 See attached screenshots which shows the shards and replicas as well as the 
 core UI output of each of the shard cores taken at the same time.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7395) Major numDocs inconsistency between leader and follower replicas in SolrCloud on HDFS, 20k vs 193k

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7395:
--
Summary: Major numDocs inconsistency between leader and follower replicas 
in SolrCloud on HDFS, 20k vs 193k  (was: Major numDocs inconsistency between 
leader and follower replicas in SolrCloud on HDFS)

 Major numDocs inconsistency between leader and follower replicas in SolrCloud 
 on HDFS, 20k vs 193k
 --

 Key: SOLR-7395
 URL: https://issues.apache.org/jira/browse/SOLR-7395
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDocs inconsistencies between leader and follower in 
 SolrCloud running on HDFS during bulk indexing jobs from Hive.
 See attached screenshots which show the leader/follower relationships and 
 screenshots of the core UI showing the huge numDocs discrepancies of 20k vs 
 193k docs.
 This initially seemed related to SOLR-4260, except that was supposed to be 
 fixed several versions ago and this is running on HDFS which may be the 
 difference.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7394) Shard replicas don't recover after cluster wide restart

2015-04-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496220#comment-14496220
 ] 

Hari Sekhon edited comment on SOLR-7394 at 4/15/15 2:15 PM:


I don't have this cluster any more... so I only have what I saved at the time. 
I'm attaching a screenshot from the Cloud admin UI showing both replicas of a 
myCollection1 shard2 marked as recovery failed and the logs from all nodes.


was (Author: harisekhon):
I don't have this cluster any more... so I only have what I saved at the time. 
I'm attaching a screenshot showing both replicas of a shard marked as recovery 
failed and the logs from all nodes.

 Shard replicas don't recover after cluster wide restart
 ---

 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical
 Attachments: 145.solr.log, 146.solr.log, 147.solr.log, 148.solr.log, 
 149.solr.log, 150.solr.log, Solr_cores_not_recovering.png


 After cluster wide restart, some shards never come back online, with both 
 replicas staying red and not attempting to become leaders after one failed 
 recovery attempt. I eventually used the API to request recovery to trigger 
 them to recover and come back online, otherwise the shards stayed down 
 indefinitely.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7394) Shard replicas don't recover after cluster wide restart

2015-04-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496220#comment-14496220
 ] 

Hari Sekhon commented on SOLR-7394:
---

I don't have this cluster any more... so I only have what I saved at the time. 
I'm attaching a screenshot showing both replicas of a shard marked as recovery 
failed and the logs from all nodes.

 Shard replicas don't recover after cluster wide restart
 ---

 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical

 After cluster wide restart, some shards never come back online, with both 
 replicas staying red and not attempting to become leaders after one failed 
 recovery attempt. I eventually used the API to request recovery to trigger 
 them to recover and come back online, otherwise the shards stayed down 
 indefinitely.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7394) Shard replicas don't recover after cluster restart

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7394:
--
Description: 
After cluster wide restart, some shards never come back online, with both 
replicas staying red and not attempting to become leaders after one failed 
recovery attempt. I eventually used the API to request recovery to trigger them 
to recover and come back online, otherwise the shards stayed down indefinitely.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
After cluster wide restart, some shards never come back online, with both 
replicas staying red and not attempting to become leaders after one failed 
recovery attempt. I eventually used the API to request recovery to trigger them 
to recover and come back online.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


 Shard replicas don't recover after cluster restart
 --

 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical

 After cluster wide restart, some shards never come back online, with both 
 replicas staying red and not attempting to become leaders after one failed 
 recovery attempt. I eventually used the API to request recovery to trigger 
 them to recover and come back online, otherwise the shards stayed down 
 indefinitely.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7394) Shard replicas don't recover after cluster wide restart

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7394:
--
Summary: Shard replicas don't recover after cluster wide restart  (was: 
Shard replicas don't recover after cluster restart)

 Shard replicas don't recover after cluster wide restart
 ---

 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical

 After cluster wide restart, some shards never come back online, with both 
 replicas staying red and not attempting to become leaders after one failed 
 recovery attempt. I eventually used the API to request recovery to trigger 
 them to recover and come back online, otherwise the shards stayed down 
 indefinitely.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7395) Major numDocs inconsistency between leader and follower replicas in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7395:
--
Summary: Major numDocs inconsistency between leader and follower replicas 
in SolrCloud on HDFS  (was: Major numDocs inconsistency between leader and 
follower replicas in SolrCloud on HDFS, 20k vs 193k)

 Major numDocs inconsistency between leader and follower replicas in SolrCloud 
 on HDFS
 -

 Key: SOLR-7395
 URL: https://issues.apache.org/jira/browse/SOLR-7395
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDocs inconsistencies between leader and follower in 
 SolrCloud running on HDFS during bulk indexing jobs from Hive.
 See attached screenshots which show the leader/follower relationships and 
 screenshots of the core UI showing the huge numDocs discrepancies of 20k vs 
 193k docs.
 This initially seemed related to SOLR-4260, except that was supposed to be 
 fixed several versions ago and this is running on HDFS which may be the 
 difference.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7398) Major imbalance between shard doc counts 6k vs 193k in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7398:
--
Attachment: Cloud UI.png
149_core.png
147_core.png
146_core.png
145_core.png

 Major imbalance between shard doc counts 6k vs 193k in SolrCloud on HDFS
 

 Key: SOLR-7398
 URL: https://issues.apache.org/jira/browse/SOLR-7398
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDoc imbalance between shards in a collection such as 
 6k vs 193k docs between the 2 different shards.
 See attached screenshots which shows the shards and replicas as well as the 
 core UI output of each of the shard cores taken at the same time.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7399) Shard splitting lock timeout

2015-04-15 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7399:
-

 Summary: Shard splitting lock timeout
 Key: SOLR-7399
 URL: https://issues.apache.org/jira/browse/SOLR-7399
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Minor


When trying to shard split I've encountered the following exception before:
{code}curl 
'http://host:8983/solr/admin/collections?action=SPLITSHARDcollection=testshard=shard1wt=jsonindent=true'
{
  responseHeader:{
status:500,
QTime:3426},
  failure:{

:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'test_shard1_0_replica1': Unable to create core 
[test_shard1_0_replica1] Caused by: Lock obtain timed out: 
NativeFSLock@/data1/solr/test/index/write.lock},
  Operation splitshard caused 
exception::org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
 ADDREPLICA failed to create replica,
  exception:{
msg:ADDREPLICA failed to create replica,
rspCode:500},
  error:{
msg:ADDREPLICA failed to create replica,
trace:org.apache.solr.common.SolrException: ADDREPLICA failed to create 
replica\n\tat 
org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:364)\n\tat
 
org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:606)\n\tat
 
org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:172)\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:267)\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)\n\tat 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:368)\n\tat 
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)\n\tat
 
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)\n\tat
 
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)\n\tat
 
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)\n\tat
 org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)\n\tat 
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)\n\tat 
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)\n\tat
 
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)\n\tat
 java.lang.Thread.run(Thread.java:745)\n,
code:500}}
{code}
Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7400) Collection creation fails when over-provisioning maxShardsPerNode 1

2015-04-15 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7400:
-

 Summary: Collection creation fails when over-provisioning 
maxShardsPerNode  1
 Key: SOLR-7400
 URL: https://issues.apache.org/jira/browse/SOLR-7400
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon


When trying to overprovision shards I've encountered an issue before where the 
additional shards are trying to use the same dataDir resulting in failure to 
obtain locks for those additional shard replicas:
{code}curl 
'http://host:8983/solr/admin/collections?action=CREATEname=testnumShards=6maxShardsPerNode=6replicationFactor=2wt=jsonindent=true'
{
  responseHeader:{
status:0,
QTime:3925},
  failure:{

:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'test_shard1_replica2': Unable to create core 
[test_shard1_replica2] Caused by: Lock obtain timed out: 
NativeFSLock@/data1/solr/test/index/write.lock,

:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'test_shard6_replica1': Unable to create core 
[test_shard6_replica1] Caused by: Lock obtain timed out: 
NativeFSLock@/data1/solr/test/index/write.lock,

:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'test_shard5_replica2': Unable to create core 
[test_shard5_replica2] Caused by: Lock obtain timed out: 
NativeFSLock@/data1/solr/test/index/write.lock,

:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'test_shard2_replica1': Unable to create core 
[test_shard2_replica1] Caused by: Lock obtain timed out: 
NativeFSLock@/data1/solr/test/index/write.lock,

:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'test_shard3_replica2': Unable to create core 
[test_shard3_replica2] Caused by: Lock obtain timed out: 
NativeFSLock@/data1/solr/test/index/write.lock,

:org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:Error 
CREATEing SolrCore 'test_shard4_replica1': Unable to create core 
[test_shard4_replica1] Caused by: Lock obtain timed out: 
NativeFSLock@/data1/solr/test/index/write.lock},
 success:{
:{
  responseHeader:{
status:0,
QTime:3225},
  core:test_shard5_replica1},
:{
  responseHeader:{
status:0,
QTime:3234},
  core:test_shard6_replica2},
:{
  responseHeader:{
status:0,
QTime:3248},
  core:test_shard1_replica1},
:{
  responseHeader:{
status:0,
QTime:3433},
  core:test_shard4_replica2},
:{
  responseHeader:{
status:0,
QTime:3620},
  core:test_shard3_replica1},
:{
  responseHeader:{
status:0,
QTime:3800},
  core:test_shard2_replica2}}}
{code}
It's not clear given this how you could have more than one shard per node to 
pre-provision for anticipated node growth.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7394) Shard replicas don't recover after cluster wide restart

2015-04-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496220#comment-14496220
 ] 

Hari Sekhon edited comment on SOLR-7394 at 4/15/15 2:18 PM:


I don't have this cluster any more... so I only have what I saved at the time. 
I'm attaching a screenshot from the Cloud admin UI showing both replicas of a 
myCollection1 shard2 marked as recovery failed and the logs from all nodes.

What appears to have happened was both replicas ended up with failed recovery 
and neither wanted to them become leader and retry. The reason for both having 
failed recovery is not clear however.


was (Author: harisekhon):
I don't have this cluster any more... so I only have what I saved at the time. 
I'm attaching a screenshot from the Cloud admin UI showing both replicas of a 
myCollection1 shard2 marked as recovery failed and the logs from all nodes.

 Shard replicas don't recover after cluster wide restart
 ---

 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical
 Attachments: 145.solr.log, 146.solr.log, 147.solr.log, 148.solr.log, 
 149.solr.log, 150.solr.log, Solr_cores_not_recovering.png


 After cluster wide restart, some shards never come back online, with both 
 replicas staying red and not attempting to become leaders after one failed 
 recovery attempt. I eventually used the API to request recovery to trigger 
 them to recover and come back online, otherwise the shards stayed down 
 indefinitely.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7398) Major imbalance between shard doc counts 6k vs 193k in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7398:
-

 Summary: Major imbalance between shard doc counts 6k vs 193k in 
SolrCloud on HDFS
 Key: SOLR-7398
 URL: https://issues.apache.org/jira/browse/SOLR-7398
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon


I've observed major numDoc imbalance between shards in a collection such as 6k 
vs 193k docs between the 2 different shards.

See attached screenshots which shows the shards and replicas as well as the 
core UI output of each of the shard cores taken at the same time.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7394) Shard replicas don't recover after cluster wide restart

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7394:
--
Attachment: Solr_cores_not_recovering.png
150.solr.log
149.solr.log
148.solr.log
147.solr.log
146.solr.log
145.solr.log

 Shard replicas don't recover after cluster wide restart
 ---

 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical
 Attachments: 145.solr.log, 146.solr.log, 147.solr.log, 148.solr.log, 
 149.solr.log, 150.solr.log, Solr_cores_not_recovering.png


 After cluster wide restart, some shards never come back online, with both 
 replicas staying red and not attempting to become leaders after one failed 
 recovery attempt. I eventually used the API to request recovery to trigger 
 them to recover and come back online, otherwise the shards stayed down 
 indefinitely.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7398) Major imbalance between different shard doc counts in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7398:
--
Summary: Major imbalance between different shard doc counts in SolrCloud on 
HDFS  (was: Major imbalance between shard doc counts 6k vs 193k in SolrCloud on 
HDFS)

 Major imbalance between different shard doc counts in SolrCloud on HDFS
 ---

 Key: SOLR-7398
 URL: https://issues.apache.org/jira/browse/SOLR-7398
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDoc imbalance between shards in a collection such as 
 6k vs 193k docs between the 2 different shards.
 See attached screenshots which shows the shards and replicas as well as the 
 core UI output of each of the shard cores taken at the same time.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7394) Shard replicas don't recover after cluster wide restart

2015-04-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496371#comment-14496371
 ] 

Hari Sekhon commented on SOLR-7394:
---

Checking both of those jiras this appears to be a different issue where both 
replicas have already failed recovery and then neither wants to attempt 
recovery or take leadership again so both stay down, leaving the shard offline 
even though both server's solr instances are restarted.

Those suggested jiras don't seem to be the same thing, as the exception I've 
seen around this was recovery failed rather than zookeeper session expiration 
or tlog replay.

 Shard replicas don't recover after cluster wide restart
 ---

 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical
 Attachments: 145.solr.log, 146.solr.log, 147.solr.log, 148.solr.log, 
 149.solr.log, 150.solr.log, Solr_cores_not_recovering.png


 After cluster wide restart, some shards never come back online, with both 
 replicas staying red and not attempting to become leaders after one failed 
 recovery attempt. I eventually used the API to request recovery to trigger 
 them to recover and come back online, otherwise the shards stayed down 
 indefinitely.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7393) HDFS bulk indexing performance

2015-04-15 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7393:
-

 Summary: HDFS bulk indexing performance
 Key: SOLR-7393
 URL: https://issues.apache.org/jira/browse/SOLR-7393
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3, 4.7.2
 Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
Reporter: Hari Sekhon
Priority: Critical


When switching SolrCloud from local dataDir to HDFS directory factory indexing 
performance falls through the floor.

A previous Hive to SolrCloud online indexing job that took 2 hours for 620M 
rows ended up taking a projected 20+ hours and never completing, usually 
breaking around the 16-17 hour timeframe when left overnight.

It's worth noting that I had to disable the HDFS write cache which was causing 
index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this 
doesn't make much performance difference anway.

This is probably also related to SolrCloud not respecting HDFS replication 
factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that 
solely doesn't account for the massive performance drop going from vanilla 
SolrCloud to SolrCloud on HDFS HA + Kerberos.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7393) HDFS poor bulk indexing performance

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7393:
--
Summary: HDFS poor bulk indexing performance  (was: HDFS bulk indexing 
performance)

 HDFS poor bulk indexing performance
 ---

 Key: SOLR-7393
 URL: https://issues.apache.org/jira/browse/SOLR-7393
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
Reporter: Hari Sekhon
Priority: Critical

 When switching SolrCloud from local dataDir to HDFS directory factory 
 indexing performance falls through the floor.
 A previous Hive to SolrCloud online indexing job that took 2 hours for 620M 
 rows ended up taking a projected 20+ hours and never completing, usually 
 breaking around the 16-17 hour timeframe when left overnight.
 It's worth noting that I had to disable the HDFS write cache which was 
 causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells 
 me this doesn't make much performance difference anway.
 This is probably also related to SolrCloud not respecting HDFS replication 
 factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but 
 that solely doesn't account for the massive performance drop going from 
 vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7393) HDFS poor indexing performance

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7393:
--
Summary: HDFS poor indexing performance  (was: HDFS poor bulk indexing 
performance)

 HDFS poor indexing performance
 --

 Key: SOLR-7393
 URL: https://issues.apache.org/jira/browse/SOLR-7393
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
Reporter: Hari Sekhon
Priority: Critical

 When switching SolrCloud from local dataDir to HDFS directory factory 
 indexing performance falls through the floor.
 I've also observed very high latency on both QTime and code timer on HDFS 
 writes compares to local dataDir writes (using check_solr_write.pl from 
 https://github.com/harisekhon/nagios-plugins). Single test document write 
 latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 
 on some runs.
 A previous bulk indexing Hive to SolrCloud online indexing job that took 2 
 hours for 620M rows ended up taking a projected 20+ hours and never 
 completing, usually breaking around the 16-17 hour timeframe when left 
 overnight.
 It's worth noting that I had to disable the HDFS write cache which was 
 causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells 
 me this doesn't make much performance difference anway.
 This is probably also related to SolrCloud not respecting HDFS replication 
 factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but 
 that solely doesn't account for the massive performance drop going from 
 vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7393) HDFS poor bulk indexing performance

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7393:
--
Description: 
When switching SolrCloud from local dataDir to HDFS directory factory indexing 
performance falls through the floor.

I've also observed very high latency on both QTime and code timer on HDFS 
writes compares to local dataDir writes (using check_solr_write.pl from 
https://github.com/harisekhon/nagios-plugins). Single test document write 
latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 on 
some runs.

A previous bulk indexing Hive to SolrCloud online indexing job that took 2 
hours for 620M rows ended up taking a projected 20+ hours and never completing, 
usually breaking around the 16-17 hour timeframe when left overnight.

It's worth noting that I had to disable the HDFS write cache which was causing 
index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this 
doesn't make much performance difference anway.

This is probably also related to SolrCloud not respecting HDFS replication 
factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that 
solely doesn't account for the massive performance drop going from vanilla 
SolrCloud to SolrCloud on HDFS HA + Kerberos.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
When switching SolrCloud from local dataDir to HDFS directory factory indexing 
performance falls through the floor.

A previous Hive to SolrCloud online indexing job that took 2 hours for 620M 
rows ended up taking a projected 20+ hours and never completing, usually 
breaking around the 16-17 hour timeframe when left overnight.

It's worth noting that I had to disable the HDFS write cache which was causing 
index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this 
doesn't make much performance difference anway.

This is probably also related to SolrCloud not respecting HDFS replication 
factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that 
solely doesn't account for the massive performance drop going from vanilla 
SolrCloud to SolrCloud on HDFS HA + Kerberos.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


 HDFS poor bulk indexing performance
 ---

 Key: SOLR-7393
 URL: https://issues.apache.org/jira/browse/SOLR-7393
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
Reporter: Hari Sekhon
Priority: Critical

 When switching SolrCloud from local dataDir to HDFS directory factory 
 indexing performance falls through the floor.
 I've also observed very high latency on both QTime and code timer on HDFS 
 writes compares to local dataDir writes (using check_solr_write.pl from 
 https://github.com/harisekhon/nagios-plugins). Single test document write 
 latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 
 on some runs.
 A previous bulk indexing Hive to SolrCloud online indexing job that took 2 
 hours for 620M rows ended up taking a projected 20+ hours and never 
 completing, usually breaking around the 16-17 hour timeframe when left 
 overnight.
 It's worth noting that I had to disable the HDFS write cache which was 
 causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells 
 me this doesn't make much performance difference anway.
 This is probably also related to SolrCloud not respecting HDFS replication 
 factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but 
 that solely doesn't account for the massive performance drop going from 
 vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7393) HDFS poor indexing performance

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7393:
--
Description: 
When switching SolrCloud from local dataDir to HDFS directory factory indexing 
performance falls through the floor.

I've also observed very high latency on both QTime and code timer on HDFS 
writes compares to local dataDir writes (using check_solr_write.pl from 
https://github.com/harisekhon/nagios-plugins). Single test document write 
latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 on 
some runs.

A previous bulk online indexing job from Hive to SolrCloud that took 2 hours 
for 620M rows ended up taking a projected 20+ hours and never completing, 
usually breaking around the 16-17 hour timeframe when left overnight.

It's worth noting that I had to disable the HDFS write cache which was causing 
index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this 
doesn't make much performance difference anway.

This is probably also related to SolrCloud not respecting HDFS replication 
factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that 
solely doesn't account for the massive performance drop going from vanilla 
SolrCloud to SolrCloud on HDFS HA + Kerberos.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
When switching SolrCloud from local dataDir to HDFS directory factory indexing 
performance falls through the floor.

I've also observed very high latency on both QTime and code timer on HDFS 
writes compares to local dataDir writes (using check_solr_write.pl from 
https://github.com/harisekhon/nagios-plugins). Single test document write 
latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 on 
some runs.

A previous bulk indexing Hive to SolrCloud online indexing job that took 2 
hours for 620M rows ended up taking a projected 20+ hours and never completing, 
usually breaking around the 16-17 hour timeframe when left overnight.

It's worth noting that I had to disable the HDFS write cache which was causing 
index corruption (SOLR-7255) on the advice of Mark Miller, who tells me this 
doesn't make much performance difference anway.

This is probably also related to SolrCloud not respecting HDFS replication 
factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but that 
solely doesn't account for the massive performance drop going from vanilla 
SolrCloud to SolrCloud on HDFS HA + Kerberos.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


 HDFS poor indexing performance
 --

 Key: SOLR-7393
 URL: https://issues.apache.org/jira/browse/SOLR-7393
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.7.2, 4.10.3
 Environment: HDP 2.2 / HDP Search + LucidWorks Hive SerDe
Reporter: Hari Sekhon
Priority: Critical

 When switching SolrCloud from local dataDir to HDFS directory factory 
 indexing performance falls through the floor.
 I've also observed very high latency on both QTime and code timer on HDFS 
 writes compares to local dataDir writes (using check_solr_write.pl from 
 https://github.com/harisekhon/nagios-plugins). Single test document write 
 latency jumps from a few dozen milliseconds to 700-1700 millisecs, over 2000 
 on some runs.
 A previous bulk online indexing job from Hive to SolrCloud that took 2 hours 
 for 620M rows ended up taking a projected 20+ hours and never completing, 
 usually breaking around the 16-17 hour timeframe when left overnight.
 It's worth noting that I had to disable the HDFS write cache which was 
 causing index corruption (SOLR-7255) on the advice of Mark Miller, who tells 
 me this doesn't make much performance difference anway.
 This is probably also related to SolrCloud not respecting HDFS replication 
 factor, effectively making 4 copies of data instead of 2 (SOLR-6528), but 
 that solely doesn't account for the massive performance drop going from 
 vanilla SolrCloud to SolrCloud on HDFS HA + Kerberos.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7394) Shard replicas don't recover after cluster restart

2015-04-15 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7394:
-

 Summary: Shard replicas don't recover after cluster restart
 Key: SOLR-7394
 URL: https://issues.apache.org/jira/browse/SOLR-7394
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.10.3, 4.7.2
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
Priority: Critical


After cluster wide restart, some shards never come back online, with both 
replicas staying red and not attempting to become leaders after one failed 
recovery attempt. I eventually used the API to request recovery to trigger them 
to recover and come back online.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-4260) Inconsistent numDocs between leader and replica

2015-04-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496055#comment-14496055
 ] 

Hari Sekhon commented on SOLR-4260:
---

I've seen discrepancies between leader and followers of much higher numbers on 
newer versions of Solr than in this ticket - tens to hundreds of thousands of 
numDocs difference when doing bulk online indexing jobs (hundreds of millions 
of docs) from Hive.

 Inconsistent numDocs between leader and replica
 ---

 Key: SOLR-4260
 URL: https://issues.apache.org/jira/browse/SOLR-4260
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
 Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.6.1, Trunk

 Attachments: 192.168.20.102-replica1.png, 
 192.168.20.104-replica2.png, SOLR-4260.patch, clusterstate.png, 
 demo_shard1_replicas_out_of_sync.tgz


 After wiping all cores and reindexing some 3.3 million docs from Nutch using 
 CloudSolrServer we see inconsistencies between the leader and replica for 
 some shards.
 Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
 a small deviation in then number of documents. The leader and slave deviate 
 for roughly 10-20 documents, not more.
 Results hopping ranks in the result set for identical queries got my 
 attention, there were small IDF differences for exactly the same record 
 causing a record to shift positions in the result set. During those tests no 
 records were indexed. Consecutive catch all queries also return different 
 number of numDocs.
 We're running a 10 node test cluster with 10 shards and a replication factor 
 of two and frequently reindex using a fresh build from trunk. I've not seen 
 this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica

2015-04-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496055#comment-14496055
 ] 

Hari Sekhon edited comment on SOLR-4260 at 4/15/15 11:16 AM:
-

I've seen discrepancies between leader and followers of much higher numbers on 
newer versions of Solr than in this ticket - tens to hundreds of thousands of 
numDocs difference when doing bulk online indexing jobs (hundreds of millions 
of docs) from Hive. I'm not sure if it's related but it seemed it would be 
marked as a duplicate if I raised it separately. I was using Solr 4.7.2 and 
Solr 4.10.3 when I observed this.


was (Author: harisekhon):
I've seen discrepancies between leader and followers of much higher numbers on 
newer versions of Solr than in this ticket - tens to hundreds of thousands of 
numDocs difference when doing bulk online indexing jobs (hundreds of millions 
of docs) from Hive.

 Inconsistent numDocs between leader and replica
 ---

 Key: SOLR-4260
 URL: https://issues.apache.org/jira/browse/SOLR-4260
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
 Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.6.1, Trunk

 Attachments: 192.168.20.102-replica1.png, 
 192.168.20.104-replica2.png, SOLR-4260.patch, clusterstate.png, 
 demo_shard1_replicas_out_of_sync.tgz


 After wiping all cores and reindexing some 3.3 million docs from Nutch using 
 CloudSolrServer we see inconsistencies between the leader and replica for 
 some shards.
 Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
 a small deviation in then number of documents. The leader and slave deviate 
 for roughly 10-20 documents, not more.
 Results hopping ranks in the result set for identical queries got my 
 attention, there were small IDF differences for exactly the same record 
 causing a record to shift positions in the result set. During those tests no 
 records were indexed. Consecutive catch all queries also return different 
 number of numDocs.
 We're running a 10 node test cluster with 10 shards and a replication factor 
 of two and frequently reindex using a fresh build from trunk. I've not seen 
 this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7395) Major numDocs inconsistency between leader and follower replicas in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7395:
-

 Summary: Major numDocs inconsistency between leader and follower 
replicas in SolrCloud on HDFS
 Key: SOLR-7395
 URL: https://issues.apache.org/jira/browse/SOLR-7395
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon


I've observed major numDocs inconsistencies between leader and follower in 
SolrCloud running on HDFS during bulk indexing jobs from Hive.

See attached screenshots.

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-4260) Inconsistent numDocs between leader and replica

2015-04-15 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14496055#comment-14496055
 ] 

Hari Sekhon edited comment on SOLR-4260 at 4/15/15 11:37 AM:
-

I've seen discrepancies between leader and followers of much higher numbers on 
newer versions of Solr than in this ticket when running on HDFS, it might be a 
separate issue, raised as SOLR-7395.


was (Author: harisekhon):
I've seen discrepancies between leader and followers of much higher numbers on 
newer versions of Solr than in this ticket - tens to hundreds of thousands of 
numDocs difference when doing bulk online indexing jobs (hundreds of millions 
of docs) from Hive. I'm not sure if it's related but it seemed it would be 
marked as a duplicate if I raised it separately. I was using Solr 4.7.2 and 
Solr 4.10.3 when I observed this.

 Inconsistent numDocs between leader and replica
 ---

 Key: SOLR-4260
 URL: https://issues.apache.org/jira/browse/SOLR-4260
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
 Environment: 5.0.0.2013.01.04.15.31.51
Reporter: Markus Jelsma
Assignee: Mark Miller
Priority: Critical
 Fix For: 4.6.1, Trunk

 Attachments: 192.168.20.102-replica1.png, 
 192.168.20.104-replica2.png, SOLR-4260.patch, clusterstate.png, 
 demo_shard1_replicas_out_of_sync.tgz


 After wiping all cores and reindexing some 3.3 million docs from Nutch using 
 CloudSolrServer we see inconsistencies between the leader and replica for 
 some shards.
 Each core hold about 3.3k documents. For some reason 5 out of 10 shards have 
 a small deviation in then number of documents. The leader and slave deviate 
 for roughly 10-20 documents, not more.
 Results hopping ranks in the result set for identical queries got my 
 attention, there were small IDF differences for exactly the same record 
 causing a record to shift positions in the result set. During those tests no 
 records were indexed. Consecutive catch all queries also return different 
 number of numDocs.
 We're running a 10 node test cluster with 10 shards and a replication factor 
 of two and frequently reindex using a fresh build from trunk. I've not seen 
 this issue for quite some time until a few days ago.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7395) Major numDocs inconsistency between leader and follower replicas in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7395:
--
Description: 
I've observed major numDocs inconsistencies between leader and follower in 
SolrCloud running on HDFS during bulk indexing jobs from Hive.

See attached screenshots.

This initially seemed related to SOLR-4260, except that was supposed to be 
fixed several versions ago and this is running on HDFS which may be the 
difference.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
I've observed major numDocs inconsistencies between leader and follower in 
SolrCloud running on HDFS during bulk indexing jobs from Hive.

See attached screenshots.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


 Major numDocs inconsistency between leader and follower replicas in SolrCloud 
 on HDFS
 -

 Key: SOLR-7395
 URL: https://issues.apache.org/jira/browse/SOLR-7395
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDocs inconsistencies between leader and follower in 
 SolrCloud running on HDFS during bulk indexing jobs from Hive.
 See attached screenshots.
 This initially seemed related to SOLR-4260, except that was supposed to be 
 fixed several versions ago and this is running on HDFS which may be the 
 difference.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7395) Major numDocs inconsistency between leader and follower replicas in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7395:
--
Description: 
I've observed major numDocs inconsistencies between leader and follower in 
SolrCloud running on HDFS during bulk indexing jobs from Hive.

See attached screenshots which show the leader/follower relationships and 
screenshots of the core UI showing the huge numDocs discrepancies of 20k vs 
193k docs.

This initially seemed related to SOLR-4260, except that was supposed to be 
fixed several versions ago and this is running on HDFS which may be the 
difference.

Hari Sekhon
http://www.linkedin.com/in/harisekhon

  was:
I've observed major numDocs inconsistencies between leader and follower in 
SolrCloud running on HDFS during bulk indexing jobs from Hive.

See attached screenshots.

This initially seemed related to SOLR-4260, except that was supposed to be 
fixed several versions ago and this is running on HDFS which may be the 
difference.

Hari Sekhon
http://www.linkedin.com/in/harisekhon


 Major numDocs inconsistency between leader and follower replicas in SolrCloud 
 on HDFS
 -

 Key: SOLR-7395
 URL: https://issues.apache.org/jira/browse/SOLR-7395
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDocs inconsistencies between leader and follower in 
 SolrCloud running on HDFS during bulk indexing jobs from Hive.
 See attached screenshots which show the leader/follower relationships and 
 screenshots of the core UI showing the huge numDocs discrepancies of 20k vs 
 193k docs.
 This initially seemed related to SOLR-4260, except that was supposed to be 
 fixed several versions ago and this is running on HDFS which may be the 
 difference.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7395) Major numDocs inconsistency between leader and follower replicas in SolrCloud on HDFS

2015-04-15 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7395:
--
Attachment: Cloud UI.png
149_core.png
147_core.png
146_core.png
145_core.png

 Major numDocs inconsistency between leader and follower replicas in SolrCloud 
 on HDFS
 -

 Key: SOLR-7395
 URL: https://issues.apache.org/jira/browse/SOLR-7395
 Project: Solr
  Issue Type: Bug
  Components: Hadoop Integration, hdfs, SolrCloud
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon
 Attachments: 145_core.png, 146_core.png, 147_core.png, 149_core.png, 
 Cloud UI.png


 I've observed major numDocs inconsistencies between leader and follower in 
 SolrCloud running on HDFS during bulk indexing jobs from Hive.
 See attached screenshots.
 This initially seemed related to SOLR-4260, except that was supposed to be 
 fixed several versions ago and this is running on HDFS which may be the 
 difference.
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7256) Multiple data dirs

2015-03-20 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14371178#comment-14371178
 ] 

Hari Sekhon commented on SOLR-7256:
---

Btw Elasticsearch has multiple data dirs so I replaced my SolrCloud deployment 
with Elasticsearch yesterday as it solved this data distribution and other 
issues around scaling.

 Multiple data dirs
 --

 Key: SOLR-7256
 URL: https://issues.apache.org/jira/browse/SOLR-7256
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon

 Request to support multiple dataDirs as indexing a large collection fills up 
 only one of many disks in modern servers (think colocating on Hadoop servers 
 with many disks).
 While HDFS is another alternative, it results in poor performance and index 
 corruption under high online indexing loads (SOLR-7255).
 While it should be possible to do multiple cores with different dataDirs, 
 that could be very difficult to manage and not humanly scale well, so I think 
 Solr should support use of multiple dataDirs natively.
 Regards,
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7255) Index Corruption on HDFS whenever online bulk indexing (from Hive)

2015-03-18 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14366991#comment-14366991
 ] 

Hari Sekhon commented on SOLR-7255:
---

Yes it was enabled, I've disabled it and re-ran the ingest which got further 
without index corruption... however the indexing speed on HDFS is so bad 
compared to local disk that the bulk ingest I'm doing that used to take 2 hours 
for 620M rows from Hive now runs for 16 hours and then fails with a broken pipe 
to the server... but that's a separate issue.

Back to this setting - I believe solr.hdfs.blockcache.write.enabled is still 
set to true by default according to this page:

https://cwiki.apache.org/confluence/display/solr/Running+Solr+on+HDFS

Default behaviour should probably be changed to false if this is buggy, then 
fixed and re-enabled when it works properly.

Is there another ticket documenting work to fix this HDFS block write cache 
corruption issue (ie should we close this jira as duplicate)?

 Index Corruption on HDFS whenever online bulk indexing (from Hive)
 --

 Key: SOLR-7255
 URL: https://issues.apache.org/jira/browse/SOLR-7255
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search + LucidWorks hadoop-lws-job.jar
Reporter: Hari Sekhon
Priority: Blocker

 When running SolrCloud on HDFS and using the LucidWorks hadoop-lws-job.jar to 
 index a Hive table (620M rows) to Solr it runs for about 1500 secs and then 
 gets this exception:
 {code}Exception in thread Lucene Merge Thread #2191 
 org.apache.lucene.index.MergePolicy$MergeException: 
 org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual 
 header=1494817490 vs expected header=1071082519 (resource: 
 BufferedChecksumIndexInput(_r3.nvm))
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:549)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:522)
 Caused by: org.apache.lucene.index.CorruptIndexException: codec header 
 mismatch: actual header=1494817490 vs expected header=1071082519 (resource: 
 BufferedChecksumIndexInput(_r3.nvm))
 at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:136)
 at 
 org.apache.lucene.codecs.lucene49.Lucene49NormsProducer.init(Lucene49NormsProducer.java:75)
 at 
 org.apache.lucene.codecs.lucene49.Lucene49NormsFormat.normsProducer(Lucene49NormsFormat.java:112)
 at 
 org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:127)
 at 
 org.apache.lucene.index.SegmentReader.init(SegmentReader.java:108)
 at 
 org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
 at 
 org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
 at 
 org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3951)
 at 
 org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3913)
 at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3766)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
 at 
 org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
 {code}
 So I deleted the whole index, re-create it and re-ran the job to send Hive 
 table contents to Solr again and it returned exactly the same exception the 
 first time after trying to send a lot of updates to Solr.
 I moved off HDFS to a normal dataDir backend and then re-indexed the full 
 table in 2 hours successfully without index corruptions.
 This implies that this is some sort of stability issue on the HDFS 
 DirectoryFactory implementation.
 Regards,
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-6305) Ability to set the replication factor for index files created by HDFSDirectoryFactory

2015-03-18 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367192#comment-14367192
 ] 

Hari Sekhon commented on SOLR-6305:
---

I also tried creating a separate hadoop conf dir pointed to via 
solr.hdfs.confdir with hdfs dfs.replication=1, then restarted all Solr 
instances, deleted and recreated the collection and dataDir but found that it 
only set the write locks to rep factor 1 and still set the data/index/segments* 
to rep factor 2. Even setting dfs.replication cluster wide resulted in the same 
behaviour which is odd (I didn't bounce the NN + DNs since this should be hdfs 
client writer side config).

Note sure if this is related to SOLR-6528.

 Ability to set the replication factor for index files created by 
 HDFSDirectoryFactory
 -

 Key: SOLR-6305
 URL: https://issues.apache.org/jira/browse/SOLR-6305
 Project: Solr
  Issue Type: Improvement
  Components: hdfs
 Environment: hadoop-2.2.0
Reporter: Timothy Potter

 HdfsFileWriter doesn't allow us to create files in HDFS with a different 
 replication factor than the configured DFS default because it uses: 
 {{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}}
 Since we have two forms of replication going on when using 
 HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication 
 factor for the Solr directories to a lower value than the default. I realize 
 this might reduce the chance of data locality but since Solr cores each have 
 their own path in HDFS, we should give operators the option to reduce it.
 My original thinking was to just use Hadoop setrep to customize the 
 replication factor, but that's a one-time shot and doesn't affect new files 
 created. For instance, I did:
 {{hadoop fs -setrep -R 1 solr49/coll1}}
 My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an 
 example
 Then added some more docs to the coll1 and did:
 {{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}}
 3 -- should be 1
 So it looks like new files don't inherit the repfact from their parent 
 directory.
 Not sure if we need to go as far as allowing different replication factor per 
 collection but that should be considered if possible.
 I looked at the Hadoop 2.2.0 code to see if there was a way to work through 
 this using the Configuration object but nothing jumped out at me ... and the 
 implementation for getServerDefaults(path) is just:
   public FsServerDefaults getServerDefaults(Path p) throws IOException {
 return getServerDefaults();
   }
 Path is ignored ;-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6305) Ability to set the replication factor for index files created by HDFSDirectoryFactory

2015-03-18 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367192#comment-14367192
 ] 

Hari Sekhon edited comment on SOLR-6305 at 3/18/15 2:32 PM:


I'm also having problems with this in 4.10.3. I had tried creating a separate 
hadoop conf dir pointed to via solr.hdfs.confdir with hdfs dfs.replication=1, 
then restarted all Solr instances, deleted and recreated the collection and 
dataDir but found that it only set the write locks to rep factor 1 and still 
set the data/index/segments* to rep factor 2. Even setting dfs.replication 
cluster wide resulted in the same behaviour which is odd (I didn't bounce the 
NN + DNs since this should be hdfs client writer side config).

Note sure if this is related to SOLR-6528.


was (Author: harisekhon):
I also tried creating a separate hadoop conf dir pointed to via 
solr.hdfs.confdir with hdfs dfs.replication=1, then restarted all Solr 
instances, deleted and recreated the collection and dataDir but found that it 
only set the write locks to rep factor 1 and still set the data/index/segments* 
to rep factor 2. Even setting dfs.replication cluster wide resulted in the same 
behaviour which is odd (I didn't bounce the NN + DNs since this should be hdfs 
client writer side config).

Note sure if this is related to SOLR-6528.

 Ability to set the replication factor for index files created by 
 HDFSDirectoryFactory
 -

 Key: SOLR-6305
 URL: https://issues.apache.org/jira/browse/SOLR-6305
 Project: Solr
  Issue Type: Improvement
  Components: hdfs
 Environment: hadoop-2.2.0
Reporter: Timothy Potter

 HdfsFileWriter doesn't allow us to create files in HDFS with a different 
 replication factor than the configured DFS default because it uses: 
 {{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}}
 Since we have two forms of replication going on when using 
 HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication 
 factor for the Solr directories to a lower value than the default. I realize 
 this might reduce the chance of data locality but since Solr cores each have 
 their own path in HDFS, we should give operators the option to reduce it.
 My original thinking was to just use Hadoop setrep to customize the 
 replication factor, but that's a one-time shot and doesn't affect new files 
 created. For instance, I did:
 {{hadoop fs -setrep -R 1 solr49/coll1}}
 My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an 
 example
 Then added some more docs to the coll1 and did:
 {{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}}
 3 -- should be 1
 So it looks like new files don't inherit the repfact from their parent 
 directory.
 Not sure if we need to go as far as allowing different replication factor per 
 collection but that should be considered if possible.
 I looked at the Hadoop 2.2.0 code to see if there was a way to work through 
 this using the Configuration object but nothing jumped out at me ... and the 
 implementation for getServerDefaults(path) is just:
   public FsServerDefaults getServerDefaults(Path p) throws IOException {
 return getServerDefaults();
   }
 Path is ignored ;-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-6305) Ability to set the replication factor for index files created by HDFSDirectoryFactory

2015-03-18 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14367192#comment-14367192
 ] 

Hari Sekhon edited comment on SOLR-6305 at 3/18/15 2:32 PM:


I'm also having problems with this in Solr 4.10.3. I had tried creating a 
separate hadoop conf dir pointed to via solr.hdfs.confdir with hdfs 
dfs.replication=1, then restarted all Solr instances, deleted and recreated the 
collection and dataDir but found that it only set the write locks to rep factor 
1 and still set the data/index/segments* to rep factor 2. Even setting 
dfs.replication cluster wide resulted in the same behaviour which is odd (I 
didn't bounce the NN + DNs since this should be hdfs client writer side config).

Note sure if this is related to SOLR-6528.


was (Author: harisekhon):
I'm also having problems with this in 4.10.3. I had tried creating a separate 
hadoop conf dir pointed to via solr.hdfs.confdir with hdfs dfs.replication=1, 
then restarted all Solr instances, deleted and recreated the collection and 
dataDir but found that it only set the write locks to rep factor 1 and still 
set the data/index/segments* to rep factor 2. Even setting dfs.replication 
cluster wide resulted in the same behaviour which is odd (I didn't bounce the 
NN + DNs since this should be hdfs client writer side config).

Note sure if this is related to SOLR-6528.

 Ability to set the replication factor for index files created by 
 HDFSDirectoryFactory
 -

 Key: SOLR-6305
 URL: https://issues.apache.org/jira/browse/SOLR-6305
 Project: Solr
  Issue Type: Improvement
  Components: hdfs
 Environment: hadoop-2.2.0
Reporter: Timothy Potter

 HdfsFileWriter doesn't allow us to create files in HDFS with a different 
 replication factor than the configured DFS default because it uses: 
 {{FsServerDefaults fsDefaults = fileSystem.getServerDefaults(path);}}
 Since we have two forms of replication going on when using 
 HDFSDirectoryFactory, it would be nice to be able to set the HDFS replication 
 factor for the Solr directories to a lower value than the default. I realize 
 this might reduce the chance of data locality but since Solr cores each have 
 their own path in HDFS, we should give operators the option to reduce it.
 My original thinking was to just use Hadoop setrep to customize the 
 replication factor, but that's a one-time shot and doesn't affect new files 
 created. For instance, I did:
 {{hadoop fs -setrep -R 1 solr49/coll1}}
 My default dfs replication is set to 3 ^^ I'm setting it to 1 just as an 
 example
 Then added some more docs to the coll1 and did:
 {{hadoop fs -stat %r solr49/hdfs1/core_node1/data/index/segments_3}}
 3 -- should be 1
 So it looks like new files don't inherit the repfact from their parent 
 directory.
 Not sure if we need to go as far as allowing different replication factor per 
 collection but that should be considered if possible.
 I looked at the Hadoop 2.2.0 code to see if there was a way to work through 
 this using the Configuration object but nothing jumped out at me ... and the 
 implementation for getServerDefaults(path) is just:
   public FsServerDefaults getServerDefaults(Path p) throws IOException {
 return getServerDefaults();
   }
 Path is ignored ;-)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7255) Index Corruption on HDFS whenever online bulk indexing (from Hive)

2015-03-17 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7255:
-

 Summary: Index Corruption on HDFS whenever online bulk indexing 
(from Hive)
 Key: SOLR-7255
 URL: https://issues.apache.org/jira/browse/SOLR-7255
 Project: Solr
  Issue Type: Bug
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search + LucidWorks hadoop-lws-job.jar
Reporter: Hari Sekhon
Priority: Blocker


When running SolrCloud on HDFS and using the LucidWorks hadoop-lws-job.jar to 
index a Hive table (620M rows) to Solr it runs for about 1500 secs and then 
gets this exception:
{code}Exception in thread Lucene Merge Thread #2191 
org.apache.lucene.index.MergePolicy$MergeException: 
org.apache.lucene.index.CorruptIndexException: codec header mismatch: actual 
header=1494817490 vs expected header=1071082519 (resource: 
BufferedChecksumIndexInput(_r3.nvm))
at 
org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:549)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:522)
Caused by: org.apache.lucene.index.CorruptIndexException: codec header 
mismatch: actual header=1494817490 vs expected header=1071082519 (resource: 
BufferedChecksumIndexInput(_r3.nvm))
at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:136)
at 
org.apache.lucene.codecs.lucene49.Lucene49NormsProducer.init(Lucene49NormsProducer.java:75)
at 
org.apache.lucene.codecs.lucene49.Lucene49NormsFormat.normsProducer(Lucene49NormsFormat.java:112)
at 
org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:127)
at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:108)
at 
org.apache.lucene.index.ReadersAndUpdates.getReader(ReadersAndUpdates.java:145)
at 
org.apache.lucene.index.BufferedUpdatesStream.applyDeletesAndUpdates(BufferedUpdatesStream.java:282)
at org.apache.lucene.index.IndexWriter._mergeInit(IndexWriter.java:3951)
at org.apache.lucene.index.IndexWriter.mergeInit(IndexWriter.java:3913)
at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3766)
at 
org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:409)
at 
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:486)
{code}
So I deleted the whole index, re-create it and re-ran the job to send Hive 
table contents to Solr again and it returned exactly the same exception the 
first time after trying to send a lot of updates to Solr.

I moved off HDFS to a normal dataDir backend and then re-indexed the full table 
in 2 hours successfully without index corruptions.

This implies that this is some sort of stability issue on the HDFS 
DirectoryFactory implementation.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7256) Multiple data dirs

2015-03-17 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7256:
-

 Summary: Multiple data dirs
 Key: SOLR-7256
 URL: https://issues.apache.org/jira/browse/SOLR-7256
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon


Request to support multiple dataDirs as indexing a large collection fills up 
only one of many disks in modern servers (think colocating on Hadoop servers 
with many disks).

While HDFS is another alternative, it results in poor performance and index 
corruption under high online indexing loads (SOLR-7255).

While it should be possible to do multiple cores with different dataDirs, that 
could be very difficult to manage and not humanly scale well, so I think Solr 
should support use of multiple dataDirs natively.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7256) Multiple data dirs

2015-03-17 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365047#comment-14365047
 ] 

Hari Sekhon commented on SOLR-7256:
---

In solrconfig.xml I would like to be able to provide multiple comma separated 
dataDir paths as you would in say Hadoop and have it use the space on all of 
those disks equally (assuming that every directory specified is a separate disk 
- this is how Hadoop does it).

This way we would only deploy / manage 1 replica instance per node using the 
normal tooling and it would simply follow the pre-configured solrconfig.xml to 
utilize all the different disks and space.

The one problem I can see with this is that in Hadoop the configs are stored on 
local directories eg /etc/hadoop/conf but in SolrCloud they are stored in 
ZooKeeper, effectively forcing the same configuration down on all nodes, which 
may or may not have the same disks available (and quite likely one disk may 
fail requiring the config to exclude it).

The workaround to that would be to use a variable ${solr.data.dir:} and have 
some kind of local /etc/solr/solr-env.sh that contains the variable uniquely 
configurable per node if needed.

 Multiple data dirs
 --

 Key: SOLR-7256
 URL: https://issues.apache.org/jira/browse/SOLR-7256
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon

 Request to support multiple dataDirs as indexing a large collection fills up 
 only one of many disks in modern servers (think colocating on Hadoop servers 
 with many disks).
 While HDFS is another alternative, it results in poor performance and index 
 corruption under high online indexing loads (SOLR-7255).
 While it should be possible to do multiple cores with different dataDirs, 
 that could be very difficult to manage and not humanly scale well, so I think 
 Solr should support use of multiple dataDirs natively.
 Regards,
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-7256) Multiple data dirs

2015-03-17 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365047#comment-14365047
 ] 

Hari Sekhon edited comment on SOLR-7256 at 3/17/15 12:22 PM:
-

In solrconfig.xml I would like to be able to provide multiple comma separated 
dataDir paths as you would in say Hadoop and have it use the space on all of 
those disks equally (assuming that every directory specified is a separate disk 
- this is how Hadoop does it).

This way we would only deploy / manage 1 replica instance per node using the 
normal tooling and it would simply follow the pre-configured solrconfig.xml to 
utilize all the different disks and space.

The one problem I can see with this is that in Hadoop the configs are stored on 
local directories eg /etc/hadoop/conf but in SolrCloud they are stored in 
ZooKeeper, effectively forcing the same configuration down on all nodes, which 
may or may not have the same disks available (and quite likely one disk may 
fail requiring the config to exclude it).

The workaround to that would be to use a 
variable{code}${solr.data.dir:}{code}and have some kind of local 
/etc/solr/solr-env.sh that contains the variable uniquely configurable per node 
if needed.


was (Author: harisekhon):
In solrconfig.xml I would like to be able to provide multiple comma separated 
dataDir paths as you would in say Hadoop and have it use the space on all of 
those disks equally (assuming that every directory specified is a separate disk 
- this is how Hadoop does it).

This way we would only deploy / manage 1 replica instance per node using the 
normal tooling and it would simply follow the pre-configured solrconfig.xml to 
utilize all the different disks and space.

The one problem I can see with this is that in Hadoop the configs are stored on 
local directories eg /etc/hadoop/conf but in SolrCloud they are stored in 
ZooKeeper, effectively forcing the same configuration down on all nodes, which 
may or may not have the same disks available (and quite likely one disk may 
fail requiring the config to exclude it).

The workaround to that would be to use a variable ${solr.data.dir:} and have 
some kind of local /etc/solr/solr-env.sh that contains the variable uniquely 
configurable per node if needed.

 Multiple data dirs
 --

 Key: SOLR-7256
 URL: https://issues.apache.org/jira/browse/SOLR-7256
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon

 Request to support multiple dataDirs as indexing a large collection fills up 
 only one of many disks in modern servers (think colocating on Hadoop servers 
 with many disks).
 While HDFS is another alternative, it results in poor performance and index 
 corruption under high online indexing loads (SOLR-7255).
 While it should be possible to do multiple cores with different dataDirs, 
 that could be very difficult to manage and not humanly scale well, so I think 
 Solr should support use of multiple dataDirs natively.
 Regards,
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-7256) Multiple data dirs

2015-03-17 Thread Hari Sekhon (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-7256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14365136#comment-14365136
 ] 

Hari Sekhon commented on SOLR-7256:
---

RAID is fine if you're doing nothing but a purpose built SolrCloud... but one 
of the best use cases right now is SolrCloud co-located with Hadoop where there 
is a JBOD of multiple disks that you can't utilize the storage from and manage 
well without this feature.

Perhaps a workaround would be to add better tooling for multiple shard replicas 
per node, one per disk? However this goes back to the different sizes problem 
as shards can end up being not that well balanced.

With regards to locking across disks, the two options are 1) Solr locks a file 
(can be any location/disk) and then controls the disk writes across all the 
disks, or 2) Solr acquires a lock per dataDir as Hadoop does.

 Multiple data dirs
 --

 Key: SOLR-7256
 URL: https://issues.apache.org/jira/browse/SOLR-7256
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.10.3
 Environment: HDP 2.2 / HDP Search
Reporter: Hari Sekhon

 Request to support multiple dataDirs as indexing a large collection fills up 
 only one of many disks in modern servers (think colocating on Hadoop servers 
 with many disks).
 While HDFS is another alternative, it results in poor performance and index 
 corruption under high online indexing loads (SOLR-7255).
 While it should be possible to do multiple cores with different dataDirs, 
 that could be very difficult to manage and not humanly scale well, so I think 
 Solr should support use of multiple dataDirs natively.
 Regards,
 Hari Sekhon
 http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7233) rename zkcli.sh script it clashes with zkCli.sh from ZooKeeper on Mac when both are in $PATH

2015-03-11 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7233:
-

 Summary: rename zkcli.sh script it clashes with zkCli.sh from 
ZooKeeper on Mac when both are in $PATH
 Key: SOLR-7233
 URL: https://issues.apache.org/jira/browse/SOLR-7233
 Project: Solr
  Issue Type: Task
Affects Versions: 4.10
Reporter: Hari Sekhon
Priority: Trivial


Mac is case insensitive on CLI search so zkcli.sh clashes with zkCli.sh from 
ZooKeeper when both are in the $PATH.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7233) rename zkcli.sh script it clashes with zkCli.sh from ZooKeeper on Mac when both are in $PATH

2015-03-11 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7233:
--
Description: Mac is case insensitive on CLI search so zkcli.sh clashes with 
zkCli.sh from ZooKeeper when both are in the $PATH, ruining commands for one or 
the other unless the script path is qualified.  (was: Mac is case insensitive 
on CLI search so zkcli.sh clashes with zkCli.sh from ZooKeeper when both are in 
the $PATH.)

 rename zkcli.sh script it clashes with zkCli.sh from ZooKeeper on Mac when 
 both are in $PATH
 

 Key: SOLR-7233
 URL: https://issues.apache.org/jira/browse/SOLR-7233
 Project: Solr
  Issue Type: Task
  Components: scripts and tools
Affects Versions: 4.10
Reporter: Hari Sekhon
Priority: Trivial

 Mac is case insensitive on CLI search so zkcli.sh clashes with zkCli.sh from 
 ZooKeeper when both are in the $PATH, ruining commands for one or the other 
 unless the script path is qualified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-7233) rename zkcli.sh script it clashes with zkCli.sh from ZooKeeper on Mac when both are in $PATH

2015-03-11 Thread Hari Sekhon (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-7233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hari Sekhon updated SOLR-7233:
--
Component/s: scripts and tools

 rename zkcli.sh script it clashes with zkCli.sh from ZooKeeper on Mac when 
 both are in $PATH
 

 Key: SOLR-7233
 URL: https://issues.apache.org/jira/browse/SOLR-7233
 Project: Solr
  Issue Type: Task
  Components: scripts and tools
Affects Versions: 4.10
Reporter: Hari Sekhon
Priority: Trivial

 Mac is case insensitive on CLI search so zkcli.sh clashes with zkCli.sh from 
 ZooKeeper when both are in the $PATH, ruining commands for one or the other 
 unless the script path is qualified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-7095) Disaster Recovery native online cross-site replication for NRT SolrCloud

2015-02-10 Thread Hari Sekhon (JIRA)
Hari Sekhon created SOLR-7095:
-

 Summary: Disaster Recovery native online cross-site replication 
for NRT SolrCloud
 Key: SOLR-7095
 URL: https://issues.apache.org/jira/browse/SOLR-7095
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.10
Reporter: Hari Sekhon


Feature request to add native online cross-site DR support for NRT SolrCloud.

Currently NRT DR recovery requires taking down the recovering cluster including 
halting any new indexing, changing zookeeper emsembles to the other datacenter 
for one node per shard to replicate, then taking down again to switch back to 
local DC zookeeper ensemble after shard has caught up. This is a relatively 
difficult/tedious manual operation to perform and seems impossible to get 
completely up to date in scenarios with constant new update requests arriving 
during downtime of switching back to local DC's zookeeper ensemble, therefore 
preventing 100% accurate catch up.

There will be trade-offs such as making cross-site replication async to avoid 
update latency penalty, and may require a last-write-wins type scenario like 
Cassandra.

Regards,

Hari Sekhon
http://www.linkedin.com/in/harisekhon



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org