Re: retrieve ids of all indexed docs efficiently
Added a tip on the CursorMark CWiki page, thanks for the suggestion! On Wed, Jan 18, 2017 at 5:21 PM, Pushkar Raste wrote: > I think we should add the suggestion about docValues to the cursormark wiki > (documentation), we too ran in the same problem. > > On Jan 18, 2017 5:52 PM, "Erick Erickson" wrote: > >> Is your ID field docValues? Making it a docValues field should reduce >> the amount of JVM heap you need. >> >> >> But the export is _much_ preferred, it'll be lots faster as well. Of >> course to export you need the values you're returning to be >> docValues... >> >> Erick >> >> On Wed, Jan 18, 2017 at 1:12 PM, Slomin, David >> wrote: >> > The export feature sounds promising, although I'll have to talk with our >> deployment folks here about enabling it. >> > >> > The query I'm issuing is: >> > >> > http://:8983/solr/_shard1_replica1/ >> select?q=*:*&sort=id+asc&rows=1000&cursorMark=& >> fl=id&omitHeader=true&distrib=false&wt=json >> > >> > Thanks, >> > Div. >> > >> > >> > On 1/18/17, 3:54 PM, "Jan Høydahl" wrote: >> > >> > Don't know why you have mem problems. Can you paste in examples of >> full query strings during cursor mark querying? Sounds like you may be >> using it wrong. >> > >> > Or try exporting >> > >> > https://emea01.safelinks.protection.outlook.com/?url= >> https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay% >> 2Fsolr%2FExporting%2BResult%2BSets&data=01%7C01%7C% >> 7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea6 >> 4919%7C1&sdata=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D&reserved=0 >> > >> > -- >> > Jan Høydahl >> > >> > > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David < >> david.slo...@here.com>: >> > > >> > > Hi -- >> > > >> > > I'd like to retrieve the ids of all the docs in my Solr 5.3.1 >> index. In my query, I've set rows=1000, fl=id, and am using the cursorMark >> mechanism to split the overall traversal into multiple requests. Not >> because I care about the order, but because the documentation implies that >> it's necessary to make cursorMark work reliably, I've also set sort=id >> asc. While this does give me the data I need on a smaller index, it causes >> the heap memory utilization to go through the roof; for our large indices, >> the Solr JVM throws an out of memory exception, and we've already >> configured it as large as is practical given the physical memory of the >> machine. >> > > >> > > For what it's worth, we do use Solr cloud to split each of our >> indices into multiple shards. However for this query, I'm addressing a >> single shard directly (connecting to the correct Solr server instance for >> one replica of that shard and setting distrib=false in my query) rather >> than relying on Solr to route and assemble the results. >> > > Thanks in advance, >> > > Div Slomin. >> > > >> > >> > >>
Re: retrieve ids of all indexed docs efficiently
I think we should add the suggestion about docValues to the cursormark wiki (documentation), we too ran in the same problem. On Jan 18, 2017 5:52 PM, "Erick Erickson" wrote: > Is your ID field docValues? Making it a docValues field should reduce > the amount of JVM heap you need. > > > But the export is _much_ preferred, it'll be lots faster as well. Of > course to export you need the values you're returning to be > docValues... > > Erick > > On Wed, Jan 18, 2017 at 1:12 PM, Slomin, David > wrote: > > The export feature sounds promising, although I'll have to talk with our > deployment folks here about enabling it. > > > > The query I'm issuing is: > > > > http://:8983/solr/_shard1_replica1/ > select?q=*:*&sort=id+asc&rows=1000&cursorMark=& > fl=id&omitHeader=true&distrib=false&wt=json > > > > Thanks, > > Div. > > > > > > On 1/18/17, 3:54 PM, "Jan Høydahl" wrote: > > > > Don't know why you have mem problems. Can you paste in examples of > full query strings during cursor mark querying? Sounds like you may be > using it wrong. > > > > Or try exporting > > > > https://emea01.safelinks.protection.outlook.com/?url= > https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay% > 2Fsolr%2FExporting%2BResult%2BSets&data=01%7C01%7C% > 7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea6 > 4919%7C1&sdata=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D&reserved=0 > > > > -- > > Jan Høydahl > > > > > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David < > david.slo...@here.com>: > > > > > > Hi -- > > > > > > I'd like to retrieve the ids of all the docs in my Solr 5.3.1 > index. In my query, I've set rows=1000, fl=id, and am using the cursorMark > mechanism to split the overall traversal into multiple requests. Not > because I care about the order, but because the documentation implies that > it's necessary to make cursorMark work reliably, I've also set sort=id > asc. While this does give me the data I need on a smaller index, it causes > the heap memory utilization to go through the roof; for our large indices, > the Solr JVM throws an out of memory exception, and we've already > configured it as large as is practical given the physical memory of the > machine. > > > > > > For what it's worth, we do use Solr cloud to split each of our > indices into multiple shards. However for this query, I'm addressing a > single shard directly (connecting to the correct Solr server instance for > one replica of that shard and setting distrib=false in my query) rather > than relying on Solr to route and assemble the results. > > > Thanks in advance, > > > Div Slomin. > > > > > > > >
Re: retrieve ids of all indexed docs efficiently
Is your ID field docValues? Making it a docValues field should reduce the amount of JVM heap you need. But the export is _much_ preferred, it'll be lots faster as well. Of course to export you need the values you're returning to be docValues... Erick On Wed, Jan 18, 2017 at 1:12 PM, Slomin, David wrote: > The export feature sounds promising, although I'll have to talk with our > deployment folks here about enabling it. > > The query I'm issuing is: > > http://:8983/solr/_shard1_replica1/select?q=*:*&sort=id+asc&rows=1000&cursorMark=&fl=id&omitHeader=true&distrib=false&wt=json > > Thanks, > Div. > > > On 1/18/17, 3:54 PM, "Jan Høydahl" wrote: > > Don't know why you have mem problems. Can you paste in examples of full > query strings during cursor mark querying? Sounds like you may be using it > wrong. > > Or try exporting > > > https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FExporting%2BResult%2BSets&data=01%7C01%7C%7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D&reserved=0 > > -- > Jan Høydahl > > > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David : > > > > Hi -- > > > > I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index. > In my query, I've set rows=1000, fl=id, and am using the cursorMark mechanism > to split the overall traversal into multiple requests. Not because I care > about the order, but because the documentation implies that it's necessary to > make cursorMark work reliably, I've also set sort=id asc. While this does > give me the data I need on a smaller index, it causes the heap memory > utilization to go through the roof; for our large indices, the Solr JVM > throws an out of memory exception, and we've already configured it as large > as is practical given the physical memory of the machine. > > > > For what it's worth, we do use Solr cloud to split each of our indices > into multiple shards. However for this query, I'm addressing a single shard > directly (connecting to the correct Solr server instance for one replica of > that shard and setting distrib=false in my query) rather than relying on Solr > to route and assemble the results. > > Thanks in advance, > > Div Slomin. > > > >
Re: retrieve ids of all indexed docs efficiently
The export feature sounds promising, although I'll have to talk with our deployment folks here about enabling it. The query I'm issuing is: http://:8983/solr/_shard1_replica1/select?q=*:*&sort=id+asc&rows=1000&cursorMark=&fl=id&omitHeader=true&distrib=false&wt=json Thanks, Div. On 1/18/17, 3:54 PM, "Jan Høydahl" wrote: Don't know why you have mem problems. Can you paste in examples of full query strings during cursor mark querying? Sounds like you may be using it wrong. Or try exporting https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2Fsolr%2FExporting%2BResult%2BSets&data=01%7C01%7C%7Ccc878ba7e8364e60387008d43fe4316a%7C6d4034cd72254f72b85391feaea64919%7C1&sdata=9FYFbyop1VzT2aLuZPEcY8unQnMO5R5VZEMyhCKA6iM%3D&reserved=0 -- Jan Høydahl > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David : > > Hi -- > > I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index. In my query, I've set rows=1000, fl=id, and am using the cursorMark mechanism to split the overall traversal into multiple requests. Not because I care about the order, but because the documentation implies that it's necessary to make cursorMark work reliably, I've also set sort=id asc. While this does give me the data I need on a smaller index, it causes the heap memory utilization to go through the roof; for our large indices, the Solr JVM throws an out of memory exception, and we've already configured it as large as is practical given the physical memory of the machine. > > For what it's worth, we do use Solr cloud to split each of our indices into multiple shards. However for this query, I'm addressing a single shard directly (connecting to the correct Solr server instance for one replica of that shard and setting distrib=false in my query) rather than relying on Solr to route and assemble the results. > Thanks in advance, > Div Slomin. >
Re: retrieve ids of all indexed docs efficiently
Don't know why you have mem problems. Can you paste in examples of full query strings during cursor mark querying? Sounds like you may be using it wrong. Or try exporting https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets -- Jan Høydahl > Den 18. jan. 2017 kl. 21.44 skrev Slomin, David : > > Hi -- > > I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index. In my > query, I've set rows=1000, fl=id, and am using the cursorMark mechanism to > split the overall traversal into multiple requests. Not because I care about > the order, but because the documentation implies that it's necessary to make > cursorMark work reliably, I've also set sort=id asc. While this does give me > the data I need on a smaller index, it causes the heap memory utilization to > go through the roof; for our large indices, the Solr JVM throws an out of > memory exception, and we've already configured it as large as is practical > given the physical memory of the machine. > > For what it's worth, we do use Solr cloud to split each of our indices into > multiple shards. However for this query, I'm addressing a single shard > directly (connecting to the correct Solr server instance for one replica of > that shard and setting distrib=false in my query) rather than relying on Solr > to route and assemble the results. > Thanks in advance, > Div Slomin. >
retrieve ids of all indexed docs efficiently
Hi -- I'd like to retrieve the ids of all the docs in my Solr 5.3.1 index. In my query, I've set rows=1000, fl=id, and am using the cursorMark mechanism to split the overall traversal into multiple requests. Not because I care about the order, but because the documentation implies that it's necessary to make cursorMark work reliably, I've also set sort=id asc. While this does give me the data I need on a smaller index, it causes the heap memory utilization to go through the roof; for our large indices, the Solr JVM throws an out of memory exception, and we've already configured it as large as is practical given the physical memory of the machine. For what it's worth, we do use Solr cloud to split each of our indices into multiple shards. However for this query, I'm addressing a single shard directly (connecting to the correct Solr server instance for one replica of that shard and setting distrib=false in my query) rather than relying on Solr to route and assemble the results. Thanks in advance, Div Slomin.