[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247325#comment-15247325
 ] 

ASF subversion and git services commented on SOLR-8349:
---

Commit ffbbfbbe107cf42de6299c6c94b032bb21fe716f in lucene-solr's branch 
refs/heads/branch_6x from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ffbbfbb ]

SOLR-8349: trying to address test failures


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Fix For: master, 6.1
>
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-19 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247323#comment-15247323
 ] 

ASF subversion and git services commented on SOLR-8349:
---

Commit 456d5c04c87744659a241f85d0fa04c683c81a2c in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=456d5c0 ]

SOLR-8349: trying to address test failures


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Fix For: master, 6.1
>
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245450#comment-15245450
 ] 

ASF subversion and git services commented on SOLR-8349:
---

Commit 5b680de13a40021825b71d35c04a790dde2d7f6a in lucene-solr's branch 
refs/heads/branch_6x from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5b680de ]

SOLR-8349: Allow sharing of large in memory data structures across cores


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Fix For: master, 6.1
>
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245447#comment-15245447
 ] 

ASF subversion and git services commented on SOLR-8349:
---

Commit 489acdb5092676965e1dda067d35b58aeee8cf7d in lucene-solr's branch 
refs/heads/branch_6x from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=489acdb ]

SOLR-8349: Allow sharing of large in memory data structures across cores


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-18 Thread ASF subversion and git services (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245428#comment-15245428
 ] 

ASF subversion and git services commented on SOLR-8349:
---

Commit 9a1880aee821d4e6e96a8ff2fb15062b1e4c9eb1 in lucene-solr's branch 
refs/heads/master from [~noble.paul]
[ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=9a1880a ]

SOLR-8349: Allow sharing of large in memory data structures across cores


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-16 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15244256#comment-15244256
 ] 

Gus Heck commented on SOLR-8349:


also deleted the misnamed patch since it seems to sort to the top.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-15 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242847#comment-15242847
 ] 

Noble Paul commented on SOLR-8349:
--

The patch is not up to date with the master. Can you please update it

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR_8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-15 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242841#comment-15242841
 ] 

Gus Heck commented on SOLR-8349:


Actually looking closer it seems to be happening when I try to add the blob, 
and not in the code that relies on the blob, so this might be a separate issue. 
Adding a 2 second sleep after startup of the cluster in the test does not 
alleviate it so this test seems to be creating a condition where the blob store 
is not able to receive a blob. Whether that is an artifact of what's going on 
ins SolrCloudTestCase or not, I don't know.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR_8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-15 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242833#comment-15242833
 ] 

Gus Heck commented on SOLR-8349:


re-applied patch to trunk (no conflicts) and ran tests, and hit this... I had 
seen something like this before but I hadn't noted the seed at the time and it 
hadn't happened in a while so I assumed it had been fixed by something I did. 
Looks a little like it might be the blob store not being ready (SOLR-8772). 
This seed seems to reproduce it reliably, though the test passes most of the 
time for me. (Ran it 10 times, saw this twice).

{code}
 [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=BlobRepositoryCloudTest -Dtests.seed=387A9881AC04D1FB 
-Dtests.slow=true -Dtests.locale=mk -Dtests.timezone=America/Lower_Princes 
-Dtests.asserts=true -Dtests.file.encoding=UTF-8
   [junit4] ERROR   0.00s | BlobRepositoryCloudTest (suite) <<<
   [junit4]> Throwable #1: java.net.SocketException: Connection reset
   [junit4]>at 
__randomizedtesting.SeedInfo.seed([387A9881AC04D1FB]:0)
   [junit4]>at 
java.net.SocketInputStream.read(SocketInputStream.java:209)
   [junit4]>at 
java.net.SocketInputStream.read(SocketInputStream.java:141)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
   [junit4]>at 
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
   [junit4]>at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:140)
   [junit4]>at 
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:57)
   [junit4]>at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
   [junit4]>at 
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
   [junit4]>at 
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:167)
   [junit4]>at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
   [junit4]>at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
   [junit4]>at 
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:271)
   [junit4]>at 
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:184)
   [junit4]>at 
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:88)
   [junit4]>at 
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:110)
   [junit4]>at 
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:184)
   [junit4]>at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:82)
   [junit4]>at 
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:107)
   [junit4]>at 
org.apache.solr.core.BlobRepositoryCloudTest.postBlob(BlobRepositoryCloudTest.java:104)
   [junit4]>at 
org.apache.solr.core.BlobRepositoryCloudTest.setupCluster(BlobRepositoryCloudTest.java:59)
   [junit4]>at java.lang.Thread.run(Thread.java:745)
   [junit4] Completed [1/1 (1!)] in 8.94s, 0 tests, 1 error <<< FAILURES!
{code}

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR_8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-15 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242718#comment-15242718
 ] 

Noble Paul commented on SOLR-8349:
--

If you thing it is good enough, I can review and go ahead 


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR_8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-04-15 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15242710#comment-15242710
 ] 

Gus Heck commented on SOLR-8349:


Would be good to sort out what needs to be done for SOLR-8772 in time for 6.1. 

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR_8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-03-02 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175661#comment-15175661
 ] 

Gus Heck commented on SOLR-8349:


Right I got around that by creating .system first and uploading before I 
created a collection with a config that uses the blob store. I think some 
configurations that NEED the blob to function properly will want to fail if the 
blob store (and/or the blob) is not available. Both Using and not Using 
BlobStoreAware might be reasonable? Or maybe the interface should have a flag 
identifying if it is allowable to open the server without it?

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Sub-task
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR_8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-03-01 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174902#comment-15174902
 ] 

Noble Paul commented on SOLR-8349:
--

Thanks. the tests are fine. But for one thing,

The blob store is not guaranteed to be available at core load time (the 
{{.system}} collection ).  So , your component should implement 
{{BlobStoreAware}} and only in the callback for that interface , the class 
should try to load resources

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, SOLR_8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-29 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172624#comment-15172624
 ] 

Gus Heck commented on SOLR-8349:


Ah it seems a brand new class for cloud unit testing has magically appeared in 
SOLR-8758 just today. 

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-29 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172386#comment-15172386
 ] 

Gus Heck commented on SOLR-8349:


Hmm, Ive been wondering how I might write a good test case for it. Are there 
some tests out there that fire up a cloud mode solr and interact with multiple 
cores that I can use as example? I can probably come up with mock object based 
tests that exercise some of the methods,  but that rather misses the cross core 
sharing aspect of this and the usage by components. I expect SolrTestcaseJ4 
will probably hit the else block in this code:

{code}
if (this.coreContainer.isZooKeeperAware()) {
  // stuff we want to test is here!
} else {
  throw new SolrException(SolrException.ErrorCode.SERVER_ERROR, "Blob 
loading is not supported in non-cloud mode");
}
{code}

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15172260#comment-15172260
 ] 

Noble Paul commented on SOLR-8349:
--

The patch looks fine. If we have a testcase for this w could commit this right 
away

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-29 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171924#comment-15171924
 ] 

Gus Heck commented on SOLR-8349:


Oooops! sorry, it seems I uploaded the wrong file (and missed your response). 
Thx for following up!

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-29 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15171699#comment-15171699
 ] 

Noble Paul commented on SOLR-8349:
--

[~gus_heck] can u post the updated patch?

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169453#comment-15169453
 ] 

Noble Paul commented on SOLR-8349:
--

It's fine . But the patch doesn't have this extra method for decoder


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-26 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169187#comment-15169187
 ] 

Gus Heck commented on SOLR-8349:


Perhaps you were looking for a second hash map? We don't need one if we have 
more descriptive keys. I should have explained my latest patch. I was hasty.

Implementations of the Decoder interface can now (optionally) give their 
decoders names. 

{code}
 public interface Decoder {

/**
 * A name by which to distinguish this decoding. This only needs to be 
implemented if you want to support
 * decoding the same blob content with more than one decoder.
 * 
 * @return The name of the decoding, defaults to empty string.
 */
default String getName() { return ""; }

/**
 * A routine that knows how to convert the stream of bytes from the blob 
into a Java object.
 * 
 * @param inputStream the bytes from a blob
 * @return A Java object of the specified type.
 */
T decode(InputStream inputStream);
  }

{code}

The internal hashmap that holds blob content objects (be they decoded or not) 
will append this name to the key for put and get operations in getIncrementRef 
and store the appropriate 
value in the key field of BlobContent

{code}
  BlobContentRef getBlobIncRef(String key, Decoder decoder) {
return getBlobIncRef(key.concat(decoder.getName()), () -> 
addBlob(key,decoder));
  }

in  private  BlobContentRef getBlobIncRef(String key, 
Callable blobCreator) ...

aBlob = blobs.get(key);
if (aBlob == null) {
  try {
aBlob = blobCreator.call();

{code}

Note that the unmodified key was supplied to the lambda invoking addBlob...

{code}
  // for use cases sharing java objects
  private BlobContent addBlob(String key, Decoder decoder) {  
ByteBuffer b = fetchBlob(key);
String  keyPlusName = key + decoder.getName();
BlobContent aBlob = new BlobContent<>(keyPlusName, b, decoder);
blobs.put(keyPlusName, aBlob);
return aBlob;
  }
{code}

Thus the BlobContent object is holding the more descriptive key when we get to 
decrementBlobRefCount and do this:
{code}
  if (ref.blob.references.isEmpty()) {
blobs.remove(ref.blob.key);
  }
{code}

Also note that the differing method signatures distinguish when we don't have a 
decoder because we are interested in caching the raw bytes, not the decoded 
form and a different path is taken... In this case the key in the map matches 
the key in the blob store ensuring that we can't cache the raw bytes twice.

{code}
  public BlobContentRef getBlobIncRef(String key) {
   return getBlobIncRef(key, () -> addBlob(key));
  }

  // For use cases sharing raw bytes
  private BlobContent addBlob(String key) {
ByteBuffer b = fetchBlob(key);
BlobContent aBlob  = new BlobContent<>(key, b);
blobs.put(key, aBlob);
return aBlob;
  }
{code}

Looking at it again today, It occurs to me that there is a small chance that if 
someone wants to share the raw bytes and also share a decoded form of the same 
blob and they fail to name their Decoder we would have a race condition. This 
should be avoided by defaulting to "!D!" (or some other "reserved" string) 
instead of empty string in the above Decoder interface (and document this in 
the javadoc). It is intentionally left up to the implementor to decide whether 
or not they provide names that allow/prevent collisions.


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-26 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169057#comment-15169057
 ] 

Noble Paul commented on SOLR-8349:
--

really? I still think it doesn't take care of multiple decoders.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166522#comment-15166522
 ] 

Noble Paul commented on SOLR-8349:
--

One problem I see with the patch is, with decoding the object in two different 
ways . What if core1 has decoder1 and core2  has decoder2. Then the second call 
gets the output of first decoder. That's why I kept a map internally so that it 
is possible to deal with that usecase. It may be unusual to do so , but, for 
sake of correctness we have to do it


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-24 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163464#comment-15163464
 ] 

Gus Heck commented on SOLR-8349:


I had updated  on the 22nd. Should have pulled before I created the patch, but 
usually a one day lag isn't a problem. I didn't know you were going to create 
another issue and check it in :). I'll recreate the patch.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349_4.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-24 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15163001#comment-15163001
 ] 

Noble Paul commented on SOLR-8349:
--

Are you working on the latest trunk? I have already committed my other patch in 
SOLR-8719

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch, 
> SOLR-8349_4.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-23 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159601#comment-15159601
 ] 

Gus Heck commented on SOLR-8349:


Sure I could, but other folks who try to use this after us will likely stumble 
into that pitfall. I've played around with it and simplified BlobContent, moved 
the PluginBag specific stuff to PluginBag and provided some javadoc and a user 
friendly method on SolrCore which will ensure that a close hook is also 
created. It seems all my goals except #3 are satisfied and we've exceeded 
expectations for #2 making it possible to reliably update the content on the 
fly with no danger of memory leaks (yay). I definitely like this approach 
better. Attaching patch #4. The patch contains a class named 
org.apache.solr.handler.component.XXCustomComponent demonstrating usage 
(redacted and cleansed version of what I'm using for a client's server, adapted 
to this patch) This class obviously should not be included in the commit.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-22 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158135#comment-15158135
 ] 

Noble Paul commented on SOLR-8349:
--

bq.but I think there might be some holes WRT decoders decoding the same content 
more than once when multiple cores are loaded,

You can easily solve it by synchronizing the {{decode()}} code at your end

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-22 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15158113#comment-15158113
 ] 

Gus Heck commented on SOLR-8349:


Looked at the patch some today, Mostly it looks easy enough to adapt to my 
client's code, but I think there might be some holes WRT decoders decoding the 
same content more than once when multiple cores are loaded, and it seems we 
hold onto the ByteBuffer after the decoding, which doubles memory usage. Will 
comment more and provide suggestions (+ patch) tomorrow. Since decoding our 
file takes significant time and pegs the cpu's, I really don't want that 
repeating itself for all 40 cores :).

Loading code in my decoder will look something like this:
{code}
  ForkJoinPool pool = new 
ForkJoinPool(Runtime.getRuntime().availableProcessors());
  try (Stream lines = new BufferedReader(new 
InputStreamReader(inputStream, Charset.forName("UTF-8"))).lines()) {
try {
  pool.submit(() -> 
lines.parallel().forEach(this::processSimpleCsvRow)).get();
} catch (InterruptedException | ExecutionException e) {
  throw new IOException(e);
}
  } catch (IOException e) {
throw new RuntimeException("Cannot load  csv " , e);
  } finally {
pool.shutdownNow();
  }
{code}


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156521#comment-15156521
 ] 

David Smiley commented on SOLR-8349:


This is a nice suggestion [~noble.paul]; it looks remarkably close.  +1 to 
enhancing it to be more generic and subsume this use-case.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156323#comment-15156323
 ] 

Noble Paul commented on SOLR-8349:
--

bq.So are you proposing changing JarRepository, or adding a similar class 
called BlobRepository?

No, I would rename it to BlobRepository 

bq. Seems like fairly major surgery is required to make the JarRepository class 
fully generic.

I shall put up a patch which can do this

bq.I need to better understand the lazy="true" bit you mentioned,

I understand the problem with {{startup=lazy}} we probably should make a new 
interface called {{BlobStoreAware}} which loads the component when the 
BlobStore is available. But let's not keep it separate  

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156319#comment-15156319
 ] 

Gus Heck commented on SOLR-8349:


hmm looks like I didn't realize my ide had navigated me to another class 
getByteBuffer is on MemClassLoader...

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156315#comment-15156315
 ] 

Gus Heck commented on SOLR-8349:


I do see how what you suggest could be made to work if a decoder was supplied 
and the decoded version was what was cached. 

So are you proposing changing JarRepository, or adding a similar class called 
BlobRepository? JarRepository does seem very Jar focused. (returning 
JarContent, JarContentRef, etc..) It would for example no longer be possible to 
return a ByteBuffer type from getByteBuffer, since returning the bytes from 
which to decode a HashMap won't help. Seems like fairly major surgery is 
required to make the JarRepository class fully generic. 

I do like the versioned resource and config API bit though, that's nice.

I need to better understand the lazy="true" bit you mentioned, because I very 
much don't ever want a user query to be the thing that causes the resource to 
load (a process that could take minutes). The resource should be available and 
ready to go when the system starts accepting queries.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
>Assignee: Noble Paul
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156295#comment-15156295
 ] 

Noble Paul commented on SOLR-8349:
--

bq.That may come in handy if/when a refresh mechanism is desired.

The refresh mechanism is built into the system . The name is a combination of 
name and version example: {{myCsvFile/1}} (where '1' is the first version) 
Every component can be updated using the API. If you add a new version of the 
blob, you should just simply  update your component to use {{myCsvFile/2}}. The 
rest is automatically taken care of

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156293#comment-15156293
 ] 

Noble Paul commented on SOLR-8349:
--

OK , I got the point. The framework can be easily extended to making the codec 
pluggable. So the cache can just keep the decoded object in memory instead of 
ByteBuffer

So the API  would look like
{code}
MyCsvDecoder csvDecoder = null;//initiate your decoder that 
would convert your csv to MyCustomObject
ObjectRef ref = BlobRepository.getObjectIncRef(name, 
csvDecoder );
{code}

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156290#comment-15156290
 ] 

Gus Heck commented on SOLR-8349:


That sounds like a very cool way to store the source data, but this issue isn't 
about access to the blob/file itself. It's about access to the in memory java 
object created from the file/blob/whatever, so each core does not have to hold 
it's own identical copy in RAM. In our case we had a csv file that was 
translated into a HashMap (for fast access *during* query processing) that had 
>900MB of memory usage. When we fired up 40 cores (20 collections each with a 
main core and one replica) that used this component RAM usage was through the 
roof with no data in the index yet. 

A similar situation is faced by Analyzers that have dictionaries etc (see 
SOLR-3443). The current patch no longer supports a solution to that issue, as 
support for Lucene layer stuff has been deferred per discussion above.

It is interesting to see that there is an implementation of the ref counting to 
use as a model, and your notes about hooks on the component will be useful too. 
That may come in handy if/when a refresh mechanism is desired.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156236#comment-15156236
 ] 

Noble Paul commented on SOLR-8349:
--

[~gus_heck] There is another easy existing solution for this problem

How to do this

1) Store your large file in the blob store
2) Use {{blobRef = JarRepository.getJarIncRef(name)}} to get the content (we 
will change the method names for it to make sense for you)
3) Make your component register as a closehook
4) In the {{postClose()}}, do a {{blobref.decrementJarRefCount()}}


The advantages of this solution are,

1) You get a version managed store for your large files without screwing up 
your ZK
2) It is already refcounted etc

caveats are

1) It does not work for Standalone. We can extend it to do that
2) You will have to make your component {{startup=lazy}}

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-21 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156203#comment-15156203
 ] 

Gus Heck commented on SOLR-8349:


*WRT #3/derministic behavior*: Here's the use case:

# server is started, it loads a component that loads a file and creates 
resource A version 1 into memory
# some time later the file is updated, and these updates need to be deployed
# the new version 2 of the file is deployed to the server and the core is 
unloaded 
# the core is then loaded again and brought on line and made available to users.

We now cannot predict which version of the resource is available to the users. 
If GC occured and the resource was collected between steps 3 and 4 the new 
resource will become available as the user would expect. If not, the old 
resource will show up on calls to getResource() until a GC occurs in which the 
JVM decides to clear the weak reference to it. If the component caches a (hard) 
reference to the resource, the new version of the resource will never get 
loaded. The previous system without weak references did not allow the old 
resource to ever be unloaded (and hence was deterministic). Now the behavior is 
a product of GC timing and the internal aspects of how the component was 
programmed. I would like to subsequently (in some later patch) make it possible 
to refresh the resource in a predictable manner without restarting the whole 
node.

*WRT hard references*: I want people to have success not missteps and 
re-implementation using my feature :). For this reason I really like the weak 
references suggestion you made, but I want to manage it for them and not burden 
them with handling it properly. The submitted approach was meant to not bite 
the user who writes a component that never holds a reference to the resource. 
This would be a reasonable naive implementation for someone who knows nothing 
about the internals of solr and assumed they shouldn't hold the reference to 
ensure that the same resource was always seen everywhere.

*WRT the abstraction*: it's there to get the loading code added to the 
deferredCallables list.  SolrResourceLoader has no knowledge of the SolrCore 
until the core calls inform(core) on it. Unfortunately inform(resourceLoader) 
gets called before that. So any attempt to cast and do 
((SolrResourceLoader)loader).getCore().getContainer() in the implementation of 
ResourceLoader#inform(loader) will throw an NPE. That's why the 
deferredCallables list exists. I chose to add the abstraction to enable the 
loader/core to manage hard references and allow the processing to become 
uniform with all loads being deferred. I wanted the folks attempting to use 
this to have a clear intuitive path to do so and the interfaces are meant to 
guide them into doing the right thing without needing to know all the details.

It's worth noting that if the goal is a simple patch, the way to eliminate the 
MOST complexity from the patch is to have the component author manage 
references, and change: 
{code}
  resourceLoader.inform(resourceLoader);
  resourceLoader.inform(this); // last call before the latch is released.
{code}
 to
{code}
  resourceLoader.inform(this); 
  resourceLoader.inform(resourceLoader); // last call before the latch is 
released.
{code}

In that case, casting and navigating to the container in inform(ResourceLoader) 
will work and  we can loose the abstractions, the deferred callables and 
associated latch/synchronization, and the object reference code goes away 
too... but I definitely don't feel qualified to change the order in which 
components are made aware of things. I have no idea if any code out there would 
be relying on this order of inform() calls in some way. 

Lastly, Object key's are certainly possible, though this does reintroduce a 
vector for class loader memory leakages as previously discussed. I left this 
out because we were not supporting the lucene analyzers yet, and I wasn't yet 
adding "automatic" keys from configuration nodes. Automatic keys would be a 
nice feature to improve the feature and ensure implementors don't need to think 
so hard to use it. I'm amenable to try adding that now if you like, though the 
option to supply one's own key should remain.


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch, SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory 

[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-17 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15150714#comment-15150714
 ] 

Gus Heck commented on SOLR-8349:


Actually in my particular case since all cores need it anyway I'm just loading 
using parallel streaming and enough threads to soak up all CPU on the box until 
processing finishes... That's probably not a good solution for the general use 
case though. 

Anywho, after punting Goal 3 guava works easy as pie, updating to merge with 
latest if any now.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149913#comment-15149913
 ] 

David Smiley commented on SOLR-8349:


bq. One of the reasons I had the goal of not blocking is the case that 
motivated this for a client of mine was one with 40 cores in a node all using 
this one resource, but I can accept that this might be an unusual case.

I think you could work around that by having the Search Component lazily load 
the resource in a background thread, returning a Future.  Then when the Search 
Component is invoked in a search, it grabs what's in the Future, blocking if 
necessary.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149420#comment-15149420
 ] 

David Smiley commented on SOLR-8349:


I see your point on CacheLoader needing to define its loading up-front; the 
caller of get() doesn't specify.  I was confused with JDK's ConcurrentHashMap 
which I think in JDK 8 has a Function arg overloaded version.  One way around 
this is for the Key to be be some new class hypothetically named say 
LoadingCacheKey that contains the real intended key (e.g. Node.java) plus a 
reference to a JDK 8 Function (or Supplier or something like that) that the 
CacheLoader will call.  Then the CacheLoader would null out that function in 
the key.  Admittedly that a bit's hacky, but nonetheless puts most of the 
tricky concurrency code into a dependent library where others maintain it; not 
us.  That's my intent with this.

I'll comment more later; gotta go right now.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-16 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149286#comment-15149286
 ] 

Gus Heck commented on SOLR-8349:


I did see .build(CacheLoader). :) My writeup was an explanation of why that 
(and other tactics) don't work (for the defined goals). 

What would you pass into build(CacheLoader)? How will it know how to load the 
data for a custom component that hasn't even been written yet. Furthermore, 
that pattern is meant to load on the first get(String) for a key, which 
(without further code to cause a warming-get before core init completes) 
violates Goal 5, and could block a query or update for the time it takes to 
load (~15-20 sec if I employ all 16 CPU with parallel streams in my case).

If you do use build(CacheLoader) you will have to make your implementation of 
CacheLoader a wrapper object than can accept arbitrary loaders from yet to be 
written components, and then manage the acceptance and invocation of those 
individual loaders on it's own. One of the reasons I had the goal of not 
blocking is the case that motivated this for a client of mine was one with 40 
cores in a node all using this one resource, but I can accept that this might 
be an unusual case. 

I'll submit a patch backing off on the "don't block loading other cores" goal 
soon.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-16 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148911#comment-15148911
 ] 

David Smiley commented on SOLR-8349:


bq. Change our goals, and use guava (allow cores loading the same resource to 
all block each other until loading is done)

Seems totally fine to me provided this blocking is limited to the particular 
resource being loaded -- what Guava gives you via this: 
http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/cache/CacheBuilder.html#build(com.google.common.cache.CacheLoader)

It's a shame you spent so long on that looong write-up; as I read I wanted to 
say, _just stop please, stop already, just go see 
CacheBuilder.build(CacheLoader)_ :-)

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-16 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148797#comment-15148797
 ] 

Gus Heck commented on SOLR-8349:


Executive summary of the last overly long comment:

# Guava's cache is indeed a cool cache,
# It doesn't support arbitrary loaders in a way that is consistent with my 
design,
# Either we can do one of these things (AFAICT):
## Use the working code I wrote (no guava cache)
## Change our goals, and use guava  (allow cores loading the same resource to 
all block each other until loading is done)
## Use guava and wrap it in additional loader management code of similar 
complexity as my original code.
# Weak/soft values require someone to hold the strong reference. 

New thought this morning: I could probably add methods and a list to the 
SolrCore object for the purpose of giving it a reference to the resource at 
load time, thus tying the life-cycle of the resource to the object we want it 
to live and die with. Then weak values would probably work fine.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-15 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147835#comment-15147835
 ] 

Gus Heck commented on SOLR-8349:


Guava sounded like a great idea. I love guava caches and have used them 
frequently in the past. When you mentioned it I thought to myself why didn't I 
think of that... eventually I remembered that I had, but decided not to go 
there because I didn't want to give control of memory issues to a library. In 
reflection that may have been a bit too draconian, so I gave guava caches a 
whirl. For the first time in my experience, I think I've found a use case they 
don't cover well. My design for this feature desires the following behavior:

# get should return null if the resource is requested when no attempt has been 
made to load it.
# load a resource only once, no provision for update or replacement is 
presently required, so first one in wins is just fine. 
# subsequent attempts to load the resource are a non-blocking no-op, allowing 
cores 2 through N that require the resource to continue to configure themselves 
while resource is being loaded (possibly starting the loading of resources for 
other components).
# loading will be complete before the server completes startup and begins 
servicing requests. If a resource was supposed to be loaded and there was no 
error during loading, the component can rely on the existence of the resource 
at query (or update) time.
# never allow a query or update to solr to initiate the loading for obvious 
latency reasons.  

I realized that the unit test I supplied doesn't fully test #3 above so I 
modified it like this:
{code}
+final String[] lastThread = new String[1];
 // now add something to the map while it's under load
 try {
   Thread[] set1 = new Thread[100];

Then

   calls.incrementAndGet();
-});
+  lastThread[0] = Thread.currentThread().getName();
+}, "thread"+i); // give the threads names like thread0 thread1 thread2 
etc
   }
   for (Thread thread : set1) {
 thread.start();
+Thread.sleep(10);
   }
   while (calls.get() < 100) {
 Thread.sleep(100);
   }

Then

   Thread.sleep(100);
   long count1b = counter[0];
   long count2b = counter2[0];
+  
+  // make sure other loaders didn't wait and thread0 completed last
+  assertEquals("thread0", lastThread[0]);
{code}

This modified test still passes just fine with my original code. But so far I 
haven't made it pass with guava. The naive first attempt was to use 
get(key,loader) and ignore the return value and use getIfPresent(key) to 
service get requests (handling case #1) but this was not good:

# get(key,loader) ignoring the return value will block all cores that depend on 
this resource. The test fails with loader99 updating the array last.
# wraping the get(key,loader) in a thread fails because now they all complete 
before loading completes and the test gets null when it checks the assert that 
that the value was set. (it would eventually become set, but this is just like 
completing server startup before loading is complete, and then failing queries, 
probably with an NPE). Also, failing server startup when loading fails becomes 
problematic since we have to get the exception back out of the thread.

After those attempts I found myself getting into tracking what keys have 
already had loaders supplied and trying to coordinate blocking/pass through on 
my own, and of course the access to the collection recording what I've seen has 
to be synchronized... etc. This winds up negating most of the benefit of the 
guava cache. This is too bad because at first it looked like guava would 
replace a bunch of my code with one liners. My guess is that the guava folks 
didn't supply put(k,loader) because most of the time you aren't accepting 
arbitrary loaders in the first place. If the loader is known in advance, then 
LoadingCache works nicely. Unfortunately, for us, the loader is not known in 
advance. I don't see any reasonable way to write "one loader to rule them all"  
to allow us to use a LoadingCache either, this would just move all my tracking 
code into TheOneLoader.java :). Possibly a good reorganization code wise but we 
are still writing threaded code and not getting much simplification out of 
using guava caches that way. If we are going to write synchronized code, we 
should just make it do the right thing directly I think.

Finally, weakValues() (or some equivalent) sounds good, but the problem is that 
the component supplies a loader and the container loads it at time A, and then 
at time B the component asks for the result of the load. Even if the component 
caches the results of that first request, the time between A and B leaves the 
object just loaded vulnerable to GC, since the only thing referencing it is the 
weak values cache. One could require the loader to 

[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142878#comment-15142878
 ] 

David Smiley commented on SOLR-8349:


First; I propose the container level cache be committed by itself (no Lucene 
abstraction or anything explicitly using it).  It will enable us Solr hackers 
to at least explicitly use it in our components without resorting to using a 
static field on a top-level class-loaded class.  A positive step forward.

Now to clear something up; I definitely do not propose that Lucene have any 
sort of dependency on Solr.  And I further propose that no change of any kind 
is needed to any of Lucene.  I'll be more specific now that I see the relevant 
part of Solr.  I propose that, Solr's FieldTypePluginLoader.readAnalyzer 
(called by IndexSchema) detect a flag attribute on a TokenizerFactory, 
TokenFilter, or CharFilterFactory that declares that it be global, and if so 
load it with a CoreContainer ResourceLoader (not a SolrCore one)  into the 
shared cache (if it's not already there).  To be specific.  This doesn't affect 
the ResourceLoader abstraction, and components don't need to be written any 
differently.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-11 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142799#comment-15142799
 ] 

Gus Heck commented on SOLR-8349:


I am not sure I understand what you mean by load the analysis component into 
the container cache. The analysis component in lucene needs some way to 
reference the container and ask for the shared resource. My assumption was that 
it was not ok to add dependencies to solr dependencies into Lucene. Otherwise I 
could just cast to SolrResourceLoader in the analysis component in question. 
Did you look at SOLR-3443 where I use this patch to implement the feature for 
HunspellStemFilterFactory.java. do this:

{code}
if (loader instanceof ContainerResourceSharing) {
resourceSharing = (ContainerResourceSharing) loader;
{code}

this then allows me to have a method like:
{code}
  public Dictionary getDictionary() {
return this.dictionary == null ? (Dictionary) 
resourceSharing.getContainerResource(resourceKey) : this.dictionary;
  }
{code}
If I can use a Solr Class, that could just as easily say 

{code}
if (loader instanceof SolrResourceLoader) {
resourceSharing = (SolrResourceLoader) loader;
{code}

One could also add methods to ResourceLoader itself, but then all forms of 
resource loader have to deal with implementing this method.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-11 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143872#comment-15143872
 ] 

Gus Heck commented on SOLR-8349:


Certainly we can punt enabling this for lucene level things to start with, and 
then tackle that in SOLR-3443 or in a separate Lucene ticket if appropriate. 
I'm totally fine with that. My motivating use case that got me started on this 
was a SearchComponent anyway.

Now that I understand it, your idea is interesting. It generally sounds like it 
would allow a similar memory savings, though I had assumed promoting things to 
a higher class loader level would be undesirable. Here's some things that 
occurred to me as I pondered your suggestion... 

# My (potentially erroneous) assumption is that the whole reason for the class 
loader separation is to allow long term stability with loading/unloading cores 
and not retain references to classes and objects that aren't needed anymore. In 
the extreme, if cores were loaded in the same class loader as the container 
this issue would be less complex.
# Putting the entire complexity of the custom Filter/Analyzer etc into the 
cache greatly enhances the chance of a class loader memory leak. Minimizing the 
complexity of what's cached puts the programmer in the best position to ensure 
core level classes don't become referenced by objects held in the container. 
Note that a follow on to this (or perhaps something required for it?) might be 
to reference count and unload unused keys.
# That said, why would we do it one way for analysis classes and another for 
components? If your direction is selected for analysis classes perhaps we 
should do components that way too?
# If we do that, what's left to be loaded in the core level loader? The 
non-increasing set of classes never previously loaded as global and whatever is 
not referenced by any component/analysis class I guess... 
# I'm a little concerned about how we will manage to automatically create 
appropriate keys in the cache. The same analysis class or component may be 
configured multiple times and so we need a key that hashes the important 
configuration parameters to distinguish identical instances from variants. 
Automatic determination of "important" seems dicey though we could simply be 
pessimistic and use every configuration parameter we can find, but then we need 
to know which fields are representative of configuration parameters 
(annotation? but then we're modifying lucene again, drat) or intercept this 
information as we read the configuration, before we create the instance of the 
class? Do we have classes that configure complex sub components and hold those 
as fields?

In SOLR-3443 I do the following to generate a key for the resource cache: 

{code}
md5.update(cs.encode(dictionaryFiles).array());
md5.update(cs.encode(affixFile).array());
md5.update(cs.encode(String.valueOf(ignoreCase)).array());
md5.update(cs.encode(String.valueOf(longestOnly)).array());

// ** SNIP ** //

resourceKey = "org.apache.lucene.analysis.hunspell.dictionary." + configHash;
{code}

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-11 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15144017#comment-15144017
 ] 

David Smiley commented on SOLR-8349:


1. the classLoader separation is simply because there are libs and conf/ 
resources per-core, as cores might be for entirely separate purposes with 
different dependencies that might not even be compatible if it were all 
together.
2. RE memory-leak:  I propose the weakValues() feature of Guava's cache.  It's 
a nice cache :-)  No reference counting needed; we're in a VM that GCs.
3. you're right; this isn't just about analysis components.  But it would 
probably be incompatible with anything that implements CoreAware or SchemaAware 
as both those things are tied to a SolrCore.
4. RE "what's left to be loaded in the core level loader": I think this re-use 
feature should be explicitly opt-in.  Most stuff will cotinue as-is -- core 
level loader.  Docs for HunspellFactory on the ref guide could be augmented to 
suggest flagging global-reuse because we know it uses a lot of memory.
5. RE keys: I think just the W3C DOM Node.java instance is probably fine.  We 
might want to double-check it's equals() method makes sense and doesn't refer 
to the parent, or if it does find a way to clone/detach it.  Actually I think 
it does; so we'd call cloneNode(true) and use the result of that.  I'm not 
concerned on the efficiency of the keys; this cache isn't going to be hit hard.

Any way, lets continue this discussion if need be on a follow-on issue for 
actually _using_ the cache that this issue will create.  Cool?

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-10 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140881#comment-15140881
 ] 

Gus Heck commented on SOLR-8349:


Can this make it into 6.0? I'm slightly surprised that a feature that can save 
memory, especially for nodes with large numbers of cores hasn't generated more 
interest. 

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-10 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140931#comment-15140931
 ] 

David Smiley commented on SOLR-8349:


I take back my claim that I'll commit it if only it used Guava.  I forgot about 
the Lucene layer part.  I think you should file a Lucene issue on introducing 
that interface.  It'll get input from more core Lucene devs.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-10 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15140924#comment-15140924
 ] 

David Smiley commented on SOLR-8349:


Instead of a hand-rolled cache with separate per-key locks I suggest Guava's 
cache which already does this.  See:  
https://code.google.com/p/guava-libraries/wiki/CachesExplained

I'll commit your patch to 6 if you use that.  

bq. I'm slightly surprised that a feature that can save memory, especially for 
nodes with large numbers of cores hasn't generated more interest.

I think internal plumbing issues like this capture less interest since it's 
something another Solr hacker can work around.  I've solved what you're solving 
here before for a client but without hacking Solr. It wasn't elegant... but I 
recall I put a lib on the top level classpath (not per-core) and I used a 
static member to hold the cache.

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-10 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141171#comment-15141171
 ] 

Gus Heck commented on SOLR-8349:


Thanks for the feedback Dave, I generally like guava caches, so this sounds 
like a good idea. I seem to recall that the Lucene part was motivated by the 
desire to have this fix SOLR-3443. I suspect it could be pulled out if needed. 
Should the Lucene ticket simply point to this or do I need to generate a 
separate patch for that that breaks things out?

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2016-02-10 Thread David Smiley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15142190#comment-15142190
 ] 

David Smiley commented on SOLR-8349:


I put a little more thought into this issue now.  The cache at the 
CoreContainer makes sense but I'm not convinced that the Lucene layer needs a 
new abstraction.  Instead, I think Solr could be enhanced to load some analysis 
components into this CoreContainer cache.  The question is which ones?  Not 
all... some components will be refer to resources that are local to a SolrCore. 
 I'm not sure how easy it would be for Solr to detect that automatically; 
probably not easy.  That leaves the possibility of a new attribute on the core 
to designate it as globally shared.  What do you think?

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2015-12-24 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15071258#comment-15071258
 ] 

Gus Heck commented on SOLR-8349:


Has anyone had time to look at this?

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2015-12-07 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045423#comment-15045423
 ] 

Gus Heck commented on SOLR-8349:


I'm sure folks have been busy getting stuff into 5.4, but with the RC nearly 
produced, perhaps someone can review this now? It's intended for 6.x, but I'd 
like feedback/review so that I have something to chew on (unless of course it's 
already perfect :) ). I would definitely like someone who's comfortable with 
{{CoreContainer}} and class loading in Solr to take a peek. I put the 
definition for {{ContainerResourceSharing}} in Lucene package space. It needed 
to be there so that the hunspell analyzer for SOLR-3443 could leverage it 
without depending on Solr classes (or any other Lucene class that might load a 
large file that doesn't change from core to core). This patch adds no 
implementation in the Lucene world, only for {{SolrResourceLoader}}. Folks who 
are using Lucene in a container other than Solr can implement for their 
container or not as they desire.

The intended pattern for usage within Solr (i.e. {{SearchComponent}} 
implementations) is to either navigate to {{CoreContainer}} from a core if 
{{CoreAware}} or cast the ResourceLoader if only {{ResourceLoaderAware}}. 
Either path leads to the same place as SolrResourceLoader merely delegates to 
it's CoreContainer anyway. 

At the Lucene level, code would want to test {{instanceof 
ContainerResourceSharing}} and do the right thing based on the result, 
retaining existing behavior to support use of Lucene outside of containers on 
in containers without support.  Let me know if there are better ideas here...

> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-8349) Allow sharing of large in memory data structures across cores

2015-11-30 Thread Gus Heck (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-8349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15032007#comment-15032007
 ] 

Gus Heck commented on SOLR-8349:


Further background for this feature... a precursor version of this patch (which 
does not have the interface and thus can't fix SOLR-3443) is in use at a client 
where we had a ~900MB hash map for looking up lat/lon from custom query 
parameters. This map was needed for all our cores. This is anticipated to save 
>35GB of ram since 40+ cores will live on a machine. The original 
implementation of this lat/lon lookup feature for the client attempted to use a 
static field, but the independent class loaders (each core gets it's own class 
loader) loaded fresh copies of the class each with it's own static map.

It's worth noting that the analyzers such as the hunspell one in SOLR-3443 are 
not loaded by the core's class loaders and the excess memory there is held in a 
member field per instance, so a static variable based solution would be 
possible there. I thought it was better to provide a uniform solution. 

Another possible follow on feature (or perhaps enhancement to this one) would 
be a means of reference counting the shared resources and removing them. In the 
present (initial) patch, a long running solr instance where lots of cores are 
added and removed would potentially have unused container resources hanging 
around (though they would become used again with no loading time if a core were 
re-installed that required them). I didn't go into the complexity of removal 
because I wasn't sure if it would be deemed necessary.


> Allow sharing of large in memory data structures across cores
> -
>
> Key: SOLR-8349
> URL: https://issues.apache.org/jira/browse/SOLR-8349
> Project: Solr
>  Issue Type: Improvement
>  Components: Server
>Affects Versions: 5.3
>Reporter: Gus Heck
> Attachments: SOLR-8349.patch
>
>
> In some cases search components or analysis classes may utilize a large 
> dictionary or other in-memory structure. When multiple cores are loaded with 
> identical configurations utilizing this large in memory structure, each core 
> holds it's own copy in memory. This has been noted in the past and a specific 
> case reported in SOLR-3443. This patch provides a generalized capability, and 
> if accepted, this capability will then be used to fix SOLR-3443.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org