Re: Joining more than 2 collections

2017-05-02 Thread Zheng Lin Edwin Yeo
Hi Joel,

Thanks for the info.

Regards,
Edwin


On 3 May 2017 at 02:04, Joel Bernstein  wrote:

> Also take a look at the documentation for the "fetch" streaming expression.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein  wrote:
>
> > Yes you join more then one collection with Streaming Expressions. Here
> are
> > a few things to keep in mind.
> >
> > * You'll likely want to use the parallel function around the largest
> join.
> > You'll need to use the join keys as the partitionKeys.
> > * innerJoin: requires that the streams be sorted on the join keys.
> > * innerHashJoin: has no sorting requirement.
> >
> > So a strategy for a three collection join might look like this:
> >
> > innerHashJoin(parallel(innerJoin(bigStream, bigStream)), smallerStream)
> >
> > The largest join can be done in parallel using an innerJoin. You can then
> > wrap the stream coming out of the parallel function in an innerHashJoin
> to
> > join it to another stream.
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > wrote:
> >
> >> Hi,
> >>
> >> Is it possible to join more than 2 collections using one of the
> streaming
> >> expressions (Eg: innerJoin)? If not, is there other ways we can do it?
> >>
> >> Currently, I may need to join 3 or 4 collections together, and to output
> >> selected fields from all these collections together.
> >>
> >> I'm using Solr 6.4.2.
> >>
> >> Regards,
> >> Edwin
> >>
> >
> >
>


Re: Suggester uses lots of 'Page cache' memory

2017-05-02 Thread Damien Kamerman
Thanks Shawn, I'll have to look closer into this.

On 3 May 2017 at 12:10, Shawn Heisey  wrote:

> On 5/2/2017 6:46 PM, Damien Kamerman wrote:
> > Shalin, yes I think it's a case of the Suggester build hitting the index
> > all at once. I'm thinking it's hitting all docs, even the ones without
> > fields relevant to the suggester.
> >
> > Shawn, I am using ZFS, though I think it's comparable to other setups.
> > mmap() should still be faster, while the ZFS ARC cache may prefer more
> > memory that other OS disk caches.
> >
> > So, it sounds like I enough memory/swap to hold the entire index. When
> will
> > the memory be released? On a commit?
> > https://lucene.apache.org/core/6_5_0/core/org/apache/
> lucene/store/MMapDirectory.html
> > talks about a bug on the close().
>
> What I'm going to describe below is how things *normally* work on most
> operating systems (think Linux or Windows) with most filesystems.  If
> ZFS is different, and it sounds like it might be, then that's something
> for you to discuss with Oracle.
>
> Normally, MMap doesn't *allocate* any memory -- so there's nothing to
> release later.  It asks the operating system to map the file's contents
> to a section of virtual memory, and then the program accesses that
> memory block directly.
>
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> A typical OS takes care of translating accesses to MMap virtual memory
> into disk accesses, and uses available system memory to cache the data
> that's read so a subsequent access of the same data is super fast.
>
> On most operating systems, memory in the disk cache is always available
> to programs that request it for an allocation.
>
> ZFS uses a completely separate piece of memory for caching -- the ARC
> cache.  I do not know if the OS is able to release memory from that
> cache when a program requests it.  My experience with ZFS on Linux  (not
> with Solr) suggests that the ARC cache holds onto memory a lot tighter
> than the standard OS disk cache.  ZFS on Solaris might be a different
> animal, though.
>
> I'm finding conflicting information regarding MMap problems on ZFS.
> Some sources say that memory usage is doubled (data in both the standard
> page cache and the arc cache), some say that this is not a general
> problem.  This is probably a question for Oracle to answer.
>
> You don't want to count swap space when looking at how much memory you
> have.  Swap performance is REALLY bad.
>
> Thanks,
> Shawn
>
>


Re: Suggester uses lots of 'Page cache' memory

2017-05-02 Thread Shawn Heisey
On 5/2/2017 6:46 PM, Damien Kamerman wrote:
> Shalin, yes I think it's a case of the Suggester build hitting the index
> all at once. I'm thinking it's hitting all docs, even the ones without
> fields relevant to the suggester.
>
> Shawn, I am using ZFS, though I think it's comparable to other setups.
> mmap() should still be faster, while the ZFS ARC cache may prefer more
> memory that other OS disk caches.
>
> So, it sounds like I enough memory/swap to hold the entire index. When will
> the memory be released? On a commit?
> https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/store/MMapDirectory.html
> talks about a bug on the close().

What I'm going to describe below is how things *normally* work on most
operating systems (think Linux or Windows) with most filesystems.  If
ZFS is different, and it sounds like it might be, then that's something
for you to discuss with Oracle.

Normally, MMap doesn't *allocate* any memory -- so there's nothing to
release later.  It asks the operating system to map the file's contents
to a section of virtual memory, and then the program accesses that
memory block directly.

http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

A typical OS takes care of translating accesses to MMap virtual memory
into disk accesses, and uses available system memory to cache the data
that's read so a subsequent access of the same data is super fast.

On most operating systems, memory in the disk cache is always available
to programs that request it for an allocation.

ZFS uses a completely separate piece of memory for caching -- the ARC
cache.  I do not know if the OS is able to release memory from that
cache when a program requests it.  My experience with ZFS on Linux  (not
with Solr) suggests that the ARC cache holds onto memory a lot tighter
than the standard OS disk cache.  ZFS on Solaris might be a different
animal, though.

I'm finding conflicting information regarding MMap problems on ZFS. 
Some sources say that memory usage is doubled (data in both the standard
page cache and the arc cache), some say that this is not a general
problem.  This is probably a question for Oracle to answer.

You don't want to count swap space when looking at how much memory you
have.  Swap performance is REALLY bad.

Thanks,
Shawn



Re: Suggester uses lots of 'Page cache' memory

2017-05-02 Thread Damien Kamerman
Shalin, yes I think it's a case of the Suggester build hitting the index
all at once. I'm thinking it's hitting all docs, even the ones without
fields relevant to the suggester.

Shawn, I am using ZFS, though I think it's comparable to other setups.
mmap() should still be faster, while the ZFS ARC cache may prefer more
memory that other OS disk caches.

So, it sounds like I enough memory/swap to hold the entire index. When will
the memory be released? On a commit?
https://lucene.apache.org/core/6_5_0/core/org/apache/lucene/store/MMapDirectory.html
talks about a bug on the close().


On 2 May 2017 at 23:07, Shawn Heisey  wrote:

> On 5/1/2017 10:52 PM, Damien Kamerman wrote:
> > I have a Solr v6.4.2 collection with 12 shards and 2 replicas. Each
> > replica uses about 14GB disk usage. I'm using Solaris 11 and I see the
> > 'Page cache' grow by about 7GB for each suggester replica I build. The
> > suggester index itself is very small. The 'Page cache' memory is freed
> > when the node is stopped. I guess the Suggester component is mmap'ing
> > the entire Lucene index into memory and holding it? Is this expected
> > behavior? Is there a workaround?
>
> I found the following.  The last comment on the answer, the one about
> mmap causing double-buffering with ZFS, is possibly relevant:
>
> https://serverfault.com/a/270604
>
> What filesystem are your indexes on?  If it's ZFS, it could completely
> explain the behavior.  If it's not ZFS, then the only part of it that I
> cannot explain is the fact that the page cache is freed when Solr stops.
>
> If this double-buffering actually means that the memory is allocated
> twice, then I think that ZFS is probably the wrong filesystem to run
> Solr on, unless you have a LOT of spare memory.  You could try changing
> the directory factory to one that doesn't use MMAP, but the suggester
> index factory probably cannot be easily changed.  This is too bad --
> normally MMAP is far more efficient than "standard" filesystem access.
>
> I could be reaching completely wrong conclusions based on the limited
> research I did.
>
> Thanks,
> Shawn
>
>


Re: Reload an unloaded core

2017-05-02 Thread David Lee

I have similar needs but for a slightly different use-case.

In my case, I am breaking up cores / indexes based on the month and year 
so that I can add an alias that always points to the last few months, 
but beyond that I want to simply unload the other indexes once they get 
past a few months old. The indexes will remain on disk but I simply 
don't want my queries to have to go through the older "archived" documents.


However, users will occasionally need to have those indexes reloaded for 
research reasons so what I was doing in ES was simply re-loading all of 
the indexes that fit within the range being searched for and added those 
to an alias (let's call it "archived", for example). Once they are 
finished querying on that older data, I again unload those indexes and 
remove the alias.


From what I'm reading in this thread, this isn't quite as 
straight-forward in Solr so I'm looking for other options.


Thanks,
David

On 5/2/2017 5:04 PM, Shashank Pedamallu wrote:

Thank you Simon, Erick and Shawn for your replies. Unfortunately, restarting 
Solr is not a option for me. So, I’ll try to follow the steps given by Shawn to 
see where I’m standing. Btw, I’m using Solr 6.4.2.

Shawn, once again thank you very much for the detailed reply.

Thanks,
Shashank Pedamallu







On 5/2/17, 2:51 PM, "Shawn Heisey"  wrote:


On 5/2/2017 10:53 AM, Shashank Pedamallu wrote:

I want to unload a core from Solr without deleting data-dir or instance-dir. 
I’m performing some operations on the data-dir after this and then I would like 
to reload the core from the same data-dir. These are the things I tried:

   1.  Reload api – throws an exception saying no such core exists.
   2.  Create api – throws an exception saying a core with given name already 
exists.

Can someone point me what api I could use to achieve this. Please note that, 
I’m working with Solr in Non-Cloud mode without Zookeeper, Collections, etc.

The RELOAD command isn't going to work at all because the core has been
unloaded -- Solr doesn't know about the core, so it can't reload it.
This is a case where the language used is somewhat confusing, even
though it's completely correct.

I am about 90 percent certain that the reason the CREATE command gave
you an error message is because you tried to make a new core.properties
file before you did the CREATE.  When things are working correctly, the
CREATE command itself is what will create core.properties.  If it
already exists, CoreAdmin will give you an error.  This is the exact
text of the error I encountered when trying to use CREATE after building
a core.properties file manually:

Error CREATEing SolrCore 'foo': Could not create a new core in
C:\Users\sheisey\Downloads\solr-6.5.1\server\solr\fooas another core is
already defined there

That error message is confusing, so I will be fixing it:

https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D10599=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=YimxLbwsjFGBwV4_LR5LK4yXu_uafFvvVujg-7MDJFE=

To verify what you need to do, I fired up Solr 6.5.1 from an extracted
download directory.  I created two cores, "foo" and "bar", using the
commandline "bin\solr create" command.  Then I went to the admin UI and
unloaded foo.  The foo directory was still there, but the core was gone

>from Solr's list.

By clicking on the "Add Core" button in the Core Admin tab, typing "foo"
into name and instanceDir, and clearing the other text boxes, the core
was recreated exactly as it was before it was unloaded.

This is the log from the CREATE command that the admin UI sent:

2017-05-02 18:02:49.232 INFO  (qtp1543727556-18) [   x:foo]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
params={schema===foo=CREATE==foo=json&_=1493747904891}
status=0 QTime=396

To double-check this and show how it can be done without the admin UI, I
accessed these two URLs (in a browser), and accomplished the exact same
thing again.  The first URL unloads the core, the second asks Solr to
find the core and re-add it with default settings.

https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DUNLOAD-26core-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=CkhSi_Ik3vbgx1UYDGYcifbIuN8GUpc64dpm_hxYy8U=
https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DCREATE-26name-3Dfoo-26instanceDir-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=vORlMj_KMQCCLbYsbQM3t2y5Fy8i6IKI0zw3O8zsbcI=

If you are using additional options with your cores, such as the
configset parameter, you would need to include those options on your
CREATE call, similar to what you might have done when you initially
created the core.  With some of the options you can 

Re: Reload an unloaded core

2017-05-02 Thread Shashank Pedamallu
Thank you Simon, Erick and Shawn for your replies. Unfortunately, restarting 
Solr is not a option for me. So, I’ll try to follow the steps given by Shawn to 
see where I’m standing. Btw, I’m using Solr 6.4.2.

Shawn, once again thank you very much for the detailed reply.

Thanks,
Shashank Pedamallu







On 5/2/17, 2:51 PM, "Shawn Heisey"  wrote:

>On 5/2/2017 10:53 AM, Shashank Pedamallu wrote:
>> I want to unload a core from Solr without deleting data-dir or instance-dir. 
>> I’m performing some operations on the data-dir after this and then I would 
>> like to reload the core from the same data-dir. These are the things I tried:
>>
>>   1.  Reload api – throws an exception saying no such core exists.
>>   2.  Create api – throws an exception saying a core with given name already 
>> exists.
>>
>> Can someone point me what api I could use to achieve this. Please note that, 
>> I’m working with Solr in Non-Cloud mode without Zookeeper, Collections, etc.
>
>The RELOAD command isn't going to work at all because the core has been
>unloaded -- Solr doesn't know about the core, so it can't reload it. 
>This is a case where the language used is somewhat confusing, even
>though it's completely correct.
>
>I am about 90 percent certain that the reason the CREATE command gave
>you an error message is because you tried to make a new core.properties
>file before you did the CREATE.  When things are working correctly, the
>CREATE command itself is what will create core.properties.  If it
>already exists, CoreAdmin will give you an error.  This is the exact
>text of the error I encountered when trying to use CREATE after building
>a core.properties file manually:
>
>Error CREATEing SolrCore 'foo': Could not create a new core in
>C:\Users\sheisey\Downloads\solr-6.5.1\server\solr\fooas another core is
>already defined there
>
>That error message is confusing, so I will be fixing it:
>
>https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_SOLR-2D10599=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=YimxLbwsjFGBwV4_LR5LK4yXu_uafFvvVujg-7MDJFE=
> 
>
>To verify what you need to do, I fired up Solr 6.5.1 from an extracted
>download directory.  I created two cores, "foo" and "bar", using the
>commandline "bin\solr create" command.  Then I went to the admin UI and
>unloaded foo.  The foo directory was still there, but the core was gone
>from Solr's list.
>
>By clicking on the "Add Core" button in the Core Admin tab, typing "foo"
>into name and instanceDir, and clearing the other text boxes, the core
>was recreated exactly as it was before it was unloaded.
>
>This is the log from the CREATE command that the admin UI sent:
>
>2017-05-02 18:02:49.232 INFO  (qtp1543727556-18) [   x:foo]
>o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
>params={schema===foo=CREATE==foo=json&_=1493747904891}
>status=0 QTime=396
>
>To double-check this and show how it can be done without the admin UI, I
>accessed these two URLs (in a browser), and accomplished the exact same
>thing again.  The first URL unloads the core, the second asks Solr to
>find the core and re-add it with default settings.
>
>https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DUNLOAD-26core-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=CkhSi_Ik3vbgx1UYDGYcifbIuN8GUpc64dpm_hxYy8U=
> 
>https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_solr_admin_cores-3Faction-3DCREATE-26name-3Dfoo-26instanceDir-3Dfoo=DwIDaQ=uilaK90D4TOVoH58JNXRgQ=blJD2pBapH3dDkoajIf9mT9SSbbs19wRbChNde1ErNI=NQtabafDgjFF7o57qRBjFMUG9UDK92RE8pAtrnHmuTs=vORlMj_KMQCCLbYsbQM3t2y5Fy8i6IKI0zw3O8zsbcI=
> 
>
>If you are using additional options with your cores, such as the
>configset parameter, you would need to include those options on your
>CREATE call, similar to what you might have done when you initially
>created the core.  With some of the options you can use, re-adding a
>core might be better done by re-creating the correct core.properties
>file and restarting Solr so it discovers the core.
>
>Erick, I seem to remember the "core.properties.unloaded" rename
>happening in the past as well, but when I unloaded on 6.5.1, the
>core.properties file was simply deleted.  I don't think that's a good
>idea because it may contain information that's not available anywhere else.
>
>Thanks,
>Shawn
>


Re: Reload an unloaded core

2017-05-02 Thread Shawn Heisey
On 5/2/2017 10:53 AM, Shashank Pedamallu wrote:
> I want to unload a core from Solr without deleting data-dir or instance-dir. 
> I’m performing some operations on the data-dir after this and then I would 
> like to reload the core from the same data-dir. These are the things I tried:
>
>   1.  Reload api – throws an exception saying no such core exists.
>   2.  Create api – throws an exception saying a core with given name already 
> exists.
>
> Can someone point me what api I could use to achieve this. Please note that, 
> I’m working with Solr in Non-Cloud mode without Zookeeper, Collections, etc.

The RELOAD command isn't going to work at all because the core has been
unloaded -- Solr doesn't know about the core, so it can't reload it. 
This is a case where the language used is somewhat confusing, even
though it's completely correct.

I am about 90 percent certain that the reason the CREATE command gave
you an error message is because you tried to make a new core.properties
file before you did the CREATE.  When things are working correctly, the
CREATE command itself is what will create core.properties.  If it
already exists, CoreAdmin will give you an error.  This is the exact
text of the error I encountered when trying to use CREATE after building
a core.properties file manually:

Error CREATEing SolrCore 'foo': Could not create a new core in
C:\Users\sheisey\Downloads\solr-6.5.1\server\solr\fooas another core is
already defined there

That error message is confusing, so I will be fixing it:

https://issues.apache.org/jira/browse/SOLR-10599

To verify what you need to do, I fired up Solr 6.5.1 from an extracted
download directory.  I created two cores, "foo" and "bar", using the
commandline "bin\solr create" command.  Then I went to the admin UI and
unloaded foo.  The foo directory was still there, but the core was gone
from Solr's list.

By clicking on the "Add Core" button in the Core Admin tab, typing "foo"
into name and instanceDir, and clearing the other text boxes, the core
was recreated exactly as it was before it was unloaded.

This is the log from the CREATE command that the admin UI sent:

2017-05-02 18:02:49.232 INFO  (qtp1543727556-18) [   x:foo]
o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/cores
params={schema===foo=CREATE==foo=json&_=1493747904891}
status=0 QTime=396

To double-check this and show how it can be done without the admin UI, I
accessed these two URLs (in a browser), and accomplished the exact same
thing again.  The first URL unloads the core, the second asks Solr to
find the core and re-add it with default settings.

http://localhost:8983/solr/admin/cores?action=UNLOAD=foo
http://localhost:8983/solr/admin/cores?action=CREATE=foo=foo

If you are using additional options with your cores, such as the
configset parameter, you would need to include those options on your
CREATE call, similar to what you might have done when you initially
created the core.  With some of the options you can use, re-adding a
core might be better done by re-creating the correct core.properties
file and restarting Solr so it discovers the core.

Erick, I seem to remember the "core.properties.unloaded" rename
happening in the past as well, but when I unloaded on 6.5.1, the
core.properties file was simply deleted.  I don't think that's a good
idea because it may contain information that's not available anywhere else.

Thanks,
Shawn



Re: solr-6.3.0 error port is running already

2017-05-02 Thread Rick Leir
Satya
Say netstat --inet -lP
You might need to add -ipv4 to that command. The P might be lower case (I am on 
the bus!). And the output might show misleading service names, see 
/etc/services.
Cheers-- Rick

On May 2, 2017 3:10:30 PM EDT, Satya Marivada  wrote:
>Hi,
>
>I am getting the below exception all of a sudden with solr-6.3.0.
>"null:org.apache.solr.common.SolrException: A previous ephemeral live
>node
>still exists. Solr cannot continue. Please ensure that no other Solr
>process using the same port is running already."
>
>We are using external zookeeper and have restarted solr many times.
>There
>is no solr running on those ports already. Any suggestions. Looks like
>a
>bug. Had started using jmx option and then started getting it. Turned
>jmx
>off, still getting the same issue.
>
>We are in crunch of time, any workaround to get it started would be
>helpful. Not sure where solr is seeing that port, when everything is
>started clean.
>
>Thanks,
>Satya

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Reload an unloaded core

2017-05-02 Thread simon
the core properties definitely disappears if you use a configset, as in

#
#Written by CorePropertiesLocator
#Tue May 02 20:19:40 UTC 2017
name=testcore
dataDir=/indexes/solrindexes/testcore
configSet=myconf

Using a conf directory, as in

#Written by CorePropertiesLocator
#Tue May 02 20:30:44 UTC 2017
name=testcorewithconf
schema=conf/schema.xml
dataDir=/indexes/solrindexes/testcorewithconf

has the same behavior.

This is Solr 6.3.0 standalone, and I share your memory that at one point in
the distant past core.properties was renamed on an unload.

Probably worth submitting a JIRA

-Simon

On Tue, May 2, 2017 at 4:04 PM, Erick Erickson 
wrote:

> IIRC, the core.properties file _is_ renamed to
> core.properties.unloaded or something like that.
>
> Yeah, this is something of a pain. The inverse of "unload" is "create"
> but you have to know exactly how to create a core, and in SolrCloud
> mode that's...interesting. It's much safer to bring the Solr node
> down, do what you want then start it up, although not always possible.
>
> Best,
> Erick
>
> On Tue, May 2, 2017 at 10:55 AM, simon  wrote:
> > I ran into the exact same situation recently.  I unloaded from the
> browser
> > GUI which does not delete the data or instance dirs, but does delete
> > core.properties.  I couldn't find any API  either so I eventually
> manually
> > recreated core.properties and restarted Solr.
> >
> > Would be nice if the core.properties file were to be renamed rather than
> > deleted and if there were a RESCAN action to scan for unloaded cores and
> > reload them.
> >
> > On Tue, May 2, 2017 at 12:53 PM, Shashank Pedamallu <
> spedama...@vmware.com>
> > wrote:
> >
> >> Hi all,
> >>
> >> I want to unload a core from Solr without deleting data-dir or
> >> instance-dir. I’m performing some operations on the data-dir after this
> and
> >> then I would like to reload the core from the same data-dir. These are
> the
> >> things I tried:
> >>
> >>   1.  Reload api – throws an exception saying no such core exists.
> >>   2.  Create api – throws an exception saying a core with given name
> >> already exists.
> >>
> >> Can someone point me what api I could use to achieve this. Please note
> >> that, I’m working with Solr in Non-Cloud mode without Zookeeper,
> >> Collections, etc.
> >>
> >> Thanks in advance!
> >>
> >> Thanks,
> >> Shashank Pedamallu
> >>
>


Re: Solr performance on EC2 linux

2017-05-02 Thread Tomás Fernández Löbbe
I remember seeing some performance impact (even when not using it) and it
was attributed to the calls to System.nanoTime. See SOLR-7875 and SOLR-7876
(fixed for 5.3 and 5.4). Those two Jiras fix the impact when timeAllowed is
not used, but I don't know if there were more changes to improve the
performance of the feature itself. The problem was that System.nanoTime may
be called too many times on indices with many different terms. If this is
the problem Jeff is seeing, a small degradation of System.nanoTime could
have a big impact.

Tomás

On Tue, May 2, 2017 at 10:23 AM, Walter Underwood 
wrote:

> Hmm, has anyone measured the overhead of timeAllowed? We use it all the
> time.
>
> If nobody has, I’ll run a benchmark with and without it.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>
> > On May 2, 2017, at 9:52 AM, Chris Hostetter 
> wrote:
> >
> >
> > : I specify a timeout on all queries, 
> >
> > Ah -- ok, yeah -- you mean using "timeAllowed" correct?
> >
> > If the root issue you were seeing is in fact clocksource related,
> > then using timeAllowed would probably be a significant compounding
> > factor there since it would involve a lot of time checks in a single
> > request (even w/o any debugging enabled)
> >
> > (did your coworker's experiements with ES use any sort of equivilent
> > timeout feature?)
> >
> >
> >
> >
> >
> > -Hoss
> > http://www.lucidworks.com/
>
>


Re: Reload an unloaded core

2017-05-02 Thread Erick Erickson
IIRC, the core.properties file _is_ renamed to
core.properties.unloaded or something like that.

Yeah, this is something of a pain. The inverse of "unload" is "create"
but you have to know exactly how to create a core, and in SolrCloud
mode that's...interesting. It's much safer to bring the Solr node
down, do what you want then start it up, although not always possible.

Best,
Erick

On Tue, May 2, 2017 at 10:55 AM, simon  wrote:
> I ran into the exact same situation recently.  I unloaded from the browser
> GUI which does not delete the data or instance dirs, but does delete
> core.properties.  I couldn't find any API  either so I eventually manually
> recreated core.properties and restarted Solr.
>
> Would be nice if the core.properties file were to be renamed rather than
> deleted and if there were a RESCAN action to scan for unloaded cores and
> reload them.
>
> On Tue, May 2, 2017 at 12:53 PM, Shashank Pedamallu 
> wrote:
>
>> Hi all,
>>
>> I want to unload a core from Solr without deleting data-dir or
>> instance-dir. I’m performing some operations on the data-dir after this and
>> then I would like to reload the core from the same data-dir. These are the
>> things I tried:
>>
>>   1.  Reload api – throws an exception saying no such core exists.
>>   2.  Create api – throws an exception saying a core with given name
>> already exists.
>>
>> Can someone point me what api I could use to achieve this. Please note
>> that, I’m working with Solr in Non-Cloud mode without Zookeeper,
>> Collections, etc.
>>
>> Thanks in advance!
>>
>> Thanks,
>> Shashank Pedamallu
>>


Re: solr-6.3.0 error port is running already

2017-05-02 Thread Erick Erickson
Well, if an ephemeral node exists, restarting your Zookeeper ensemble
will delete it. Not sure what the precursor here is.

Are you absolutely and totally sure you don't have a solr process
still running on the node you try and start the shows this error? 'ps
aux | grep solr' will show you all of the running Solr instances on a
machine.

Best,
Erick

On Tue, May 2, 2017 at 12:45 PM, Satya Marivada
 wrote:
> Any ideas?  "null:org.apache.solr.common.SolrException: A previous
> ephemeral live node still exists. Solr cannot continue. Please ensure that
> no other Solr process using the same port is running already."
>
> Not sure, if JMX enablement has caused this.
>
> Thanks,
> Satya
>
> On Tue, May 2, 2017 at 3:10 PM Satya Marivada 
> wrote:
>
>> Hi,
>>
>> I am getting the below exception all of a sudden with solr-6.3.0.
>> "null:org.apache.solr.common.SolrException: A previous ephemeral live node
>> still exists. Solr cannot continue. Please ensure that no other Solr
>> process using the same port is running already."
>>
>> We are using external zookeeper and have restarted solr many times. There
>> is no solr running on those ports already. Any suggestions. Looks like a
>> bug. Had started using jmx option and then started getting it. Turned jmx
>> off, still getting the same issue.
>>
>> We are in crunch of time, any workaround to get it started would be
>> helpful. Not sure where solr is seeing that port, when everything is
>> started clean.
>>
>> Thanks,
>> Satya
>>


Re: solr-6.3.0 error port is running already

2017-05-02 Thread Satya Marivada
Any ideas?  "null:org.apache.solr.common.SolrException: A previous
ephemeral live node still exists. Solr cannot continue. Please ensure that
no other Solr process using the same port is running already."

Not sure, if JMX enablement has caused this.

Thanks,
Satya

On Tue, May 2, 2017 at 3:10 PM Satya Marivada 
wrote:

> Hi,
>
> I am getting the below exception all of a sudden with solr-6.3.0.
> "null:org.apache.solr.common.SolrException: A previous ephemeral live node
> still exists. Solr cannot continue. Please ensure that no other Solr
> process using the same port is running already."
>
> We are using external zookeeper and have restarted solr many times. There
> is no solr running on those ports already. Any suggestions. Looks like a
> bug. Had started using jmx option and then started getting it. Turned jmx
> off, still getting the same issue.
>
> We are in crunch of time, any workaround to get it started would be
> helpful. Not sure where solr is seeing that port, when everything is
> started clean.
>
> Thanks,
> Satya
>


solr-6.3.0 error port is running already

2017-05-02 Thread Satya Marivada
Hi,

I am getting the below exception all of a sudden with solr-6.3.0.
"null:org.apache.solr.common.SolrException: A previous ephemeral live node
still exists. Solr cannot continue. Please ensure that no other Solr
process using the same port is running already."

We are using external zookeeper and have restarted solr many times. There
is no solr running on those ports already. Any suggestions. Looks like a
bug. Had started using jmx option and then started getting it. Turned jmx
off, still getting the same issue.

We are in crunch of time, any workaround to get it started would be
helpful. Not sure where solr is seeing that port, when everything is
started clean.

Thanks,
Satya


Re: Joining more than 2 collections

2017-05-02 Thread Joel Bernstein
Also take a look at the documentation for the "fetch" streaming expression.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, May 2, 2017 at 2:03 PM, Joel Bernstein  wrote:

> Yes you join more then one collection with Streaming Expressions. Here are
> a few things to keep in mind.
>
> * You'll likely want to use the parallel function around the largest join.
> You'll need to use the join keys as the partitionKeys.
> * innerJoin: requires that the streams be sorted on the join keys.
> * innerHashJoin: has no sorting requirement.
>
> So a strategy for a three collection join might look like this:
>
> innerHashJoin(parallel(innerJoin(bigStream, bigStream)), smallerStream)
>
> The largest join can be done in parallel using an innerJoin. You can then
> wrap the stream coming out of the parallel function in an innerHashJoin to
> join it to another stream.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo 
> wrote:
>
>> Hi,
>>
>> Is it possible to join more than 2 collections using one of the streaming
>> expressions (Eg: innerJoin)? If not, is there other ways we can do it?
>>
>> Currently, I may need to join 3 or 4 collections together, and to output
>> selected fields from all these collections together.
>>
>> I'm using Solr 6.4.2.
>>
>> Regards,
>> Edwin
>>
>
>


Re: Joining more than 2 collections

2017-05-02 Thread Joel Bernstein
Yes you join more then one collection with Streaming Expressions. Here are
a few things to keep in mind.

* You'll likely want to use the parallel function around the largest join.
You'll need to use the join keys as the partitionKeys.
* innerJoin: requires that the streams be sorted on the join keys.
* innerHashJoin: has no sorting requirement.

So a strategy for a three collection join might look like this:

innerHashJoin(parallel(innerJoin(bigStream, bigStream)), smallerStream)

The largest join can be done in parallel using an innerJoin. You can then
wrap the stream coming out of the parallel function in an innerHashJoin to
join it to another stream.















Joel Bernstein
http://joelsolr.blogspot.com/

On Mon, May 1, 2017 at 9:42 PM, Zheng Lin Edwin Yeo 
wrote:

> Hi,
>
> Is it possible to join more than 2 collections using one of the streaming
> expressions (Eg: innerJoin)? If not, is there other ways we can do it?
>
> Currently, I may need to join 3 or 4 collections together, and to output
> selected fields from all these collections together.
>
> I'm using Solr 6.4.2.
>
> Regards,
> Edwin
>


Re: Reload an unloaded core

2017-05-02 Thread simon
I ran into the exact same situation recently.  I unloaded from the browser
GUI which does not delete the data or instance dirs, but does delete
core.properties.  I couldn't find any API  either so I eventually manually
recreated core.properties and restarted Solr.

Would be nice if the core.properties file were to be renamed rather than
deleted and if there were a RESCAN action to scan for unloaded cores and
reload them.

On Tue, May 2, 2017 at 12:53 PM, Shashank Pedamallu 
wrote:

> Hi all,
>
> I want to unload a core from Solr without deleting data-dir or
> instance-dir. I’m performing some operations on the data-dir after this and
> then I would like to reload the core from the same data-dir. These are the
> things I tried:
>
>   1.  Reload api – throws an exception saying no such core exists.
>   2.  Create api – throws an exception saying a core with given name
> already exists.
>
> Can someone point me what api I could use to achieve this. Please note
> that, I’m working with Solr in Non-Cloud mode without Zookeeper,
> Collections, etc.
>
> Thanks in advance!
>
> Thanks,
> Shashank Pedamallu
>


Re: Solr performance on EC2 linux

2017-05-02 Thread Walter Underwood
Hmm, has anyone measured the overhead of timeAllowed? We use it all the time.

If nobody has, I’ll run a benchmark with and without it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On May 2, 2017, at 9:52 AM, Chris Hostetter  wrote:
> 
> 
> : I specify a timeout on all queries, 
> 
> Ah -- ok, yeah -- you mean using "timeAllowed" correct?
> 
> If the root issue you were seeing is in fact clocksource related,
> then using timeAllowed would probably be a significant compounding 
> factor there since it would involve a lot of time checks in a single 
> request (even w/o any debugging enabled)
> 
> (did your coworker's experiements with ES use any sort of equivilent 
> timeout feature?)
> 
> 
> 
> 
> 
> -Hoss
> http://www.lucidworks.com/



Re: Clean checkbox on DIH

2017-05-02 Thread Mahmoud Almokadem
Thanks Shawn,

We already use the admin UI for testing and bulk uploads. We are using curl
scripts for automation process.

I'll report the issues regarding the new UI on JIRA.

Thanks,
Mahmoud


On Tuesday, May 2, 2017, Shawn Heisey  wrote:

> On 5/2/2017 6:53 AM, Mahmoud Almokadem wrote:
> > And for the dataimport I always use the old UI cause the new UI
> > doesn't show the live update and sometimes doesn't show the
> > configuration. I think there are many bugs on the new UI.
>
> Do you know if these problems have been reported in the Jira issue
> tracker?  The old UI is going to disappear in Solr 7.0 when it is
> released.  If there are bugs in the new UI, we need to have them
> reported so they can be fixed.
>
> As I stated earlier, when it comes to DIH, the admin UI is more useful
> for testing and research than actual usage.  The URLs for the admin UI
> cannot be used in automation tools -- the API must be used directly.
>
> Thanks,
> Shawn
>
>


Re: Solr performance on EC2 linux

2017-05-02 Thread Chris Hostetter

: I specify a timeout on all queries, 

Ah -- ok, yeah -- you mean using "timeAllowed" correct?

If the root issue you were seeing is in fact clocksource related,
then using timeAllowed would probably be a significant compounding 
factor there since it would involve a lot of time checks in a single 
request (even w/o any debugging enabled)

(did your coworker's experiements with ES use any sort of equivilent 
timeout feature?)





-Hoss
http://www.lucidworks.com/


Reload an unloaded core

2017-05-02 Thread Shashank Pedamallu
Hi all,

I want to unload a core from Solr without deleting data-dir or instance-dir. 
I’m performing some operations on the data-dir after this and then I would like 
to reload the core from the same data-dir. These are the things I tried:

  1.  Reload api – throws an exception saying no such core exists.
  2.  Create api – throws an exception saying a core with given name already 
exists.

Can someone point me what api I could use to achieve this. Please note that, 
I’m working with Solr in Non-Cloud mode without Zookeeper, Collections, etc.

Thanks in advance!

Thanks,
Shashank Pedamallu


Re: Clean checkbox on DIH

2017-05-02 Thread Rick Leir
Mahmoud,
Would it help to have field validation? If the DIH fields are still default 
when you press execute, then field validation puts out a message and blocks any 
clearing. Just an idea, please excuse if I am off track. -- Rick

On May 2, 2017 8:53:12 AM EDT, Mahmoud Almokadem  wrote:
>Thanks Shawn for your clarifications,
>
>I think showing a confirmation message saying that "The whole index
>will be
>cleaned" when the clean option is checked will be good.
>
>I always remove the check from the file
>/opt/solr/server/solr-webapp/webapp/tpl/dataimport.html after
>installing
>solr but when I upgraded it this time forget to do that and press
>Execute
>with the check and the whole index cleaned.
>
>And for the dataimport I always use the old UI cause the new UI doesn't
>show the live update and sometimes doesn't show the configuration. I
>think
>there are many bugs on the new UI.
>
>Thanks,
>Mahmoud
>
>On Mon, May 1, 2017 at 4:30 PM, Shawn Heisey 
>wrote:
>
>> On 4/28/2017 9:01 AM, Mahmoud Almokadem wrote:
>> > We already using a shell scripts to do our import and using
>fullimport
>> > command to do our delta import and everything is doing well several
>> > years ago. But default of the UI is full import with clean and
>commit.
>> > If I press the Execute button by mistake the whole index is cleaned
>> > without any notification.
>>
>> I understand your frustration.  What I'm worried about is the fallout
>if
>> we change the default to be unchecked, from people who didn't verify
>the
>> setting and expected full-import to wipe their index before it
>started
>> importing, just like it has always done for the last few years.
>>
>> The default value for the clean parameter when NOT using the admin UI
>is
>> true for full-import, and false for delta-import.  That's not going
>to
>> change.  I firmly believe that the admin UI should have the same
>> defaults as the API itself.  The very nature of a full-import carries
>> the implication that you want to start over with an empty index.
>>
>> What if there were some bright red text in the UI near the execute
>> button that urged you to double-check that the "clean" box has the
>> setting you want?  An alternate idea would be to pop up a yes/no
>> verification dialog on execute when the clean box is checked.
>>
>> Thanks,
>> Shawn
>>
>>

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: choosing placement upon RESTORE

2017-05-02 Thread xavier jmlucjav
thanks Mikhail, that sounds like it would help me as it allows you to set
createNodeSet on RESTORE calls

On Tue, May 2, 2017 at 2:50 PM, Mikhail Khludnev  wrote:

> This sounds relevant, but different to https://issues.apache.org/
> jira/browse/SOLR-9527
> You may want to follow this ticket.
>
> On Mon, May 1, 2017 at 9:15 PM, xavier jmlucjav 
> wrote:
>
>> hi,
>>
>> I am facing this situation:
>> - I have a 3 node Solr 6.1 with some 1 shard, 1 node collections (it's
>> just
>> for dev work)
>> - the collections where created with:
>>action=CREATE&...=EMPTY"
>> then
>>   action=ADDREPLICA&...=$NODEA=$DATADIR"
>> - I have taken a BACKUP of the collections
>> - Solr is upgraded to 6.5.1
>>
>> Now, I started using RESTORE to restore the collections on the node A
>> (where they lived before), but, instead of all being created in node A,
>> collections have been created in A, then B, then C nodes. Well, Solrcloud
>> tried to, as 2nd and 3rd RESTOREs failed, as the backup was in node A's
>> disk, not reachable from nodes B and C.
>>
>> How is this supposed to work? I am looking at Rule Based Placement but it
>> seems it is only available for CREATESHARD, so I can use it in RESTORE?
>> Isn't there a way to force Solrcloud to create the collection in a given
>> node?
>>
>> thanks!
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: Add new Solr Node to existing Solr setup

2017-05-02 Thread Erick Erickson
Along with Shawn's comments, if you create a new collection,
consider "oversharding". Say you calculate (more later) you
can fit your collection in N shards, but you expect, over time,
for your collection to triple. _start out_ with 3N shards, many of
them will be co-located. As you get more docs move the replicas
around with ADDREPLICA/DELETEREPLICA as Shawn suggests.

Finally, you really have to do some serious work to figure out what
the correct eventual size will be, see:

https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Best,
Erick

On Tue, May 2, 2017 at 5:32 AM, Shawn Heisey  wrote:
> On 5/2/2017 4:24 AM, Venkateswarlu Bommineni wrote:
>> We have Solr setup with below configuration.
>>
>> 1) 1 collection with one shard
>> 2)  4 Solr Nodes
>> 2)  and replication factor 4 with one replication to each Solr Node.
>>
>> as of now, it's working fine.But going forward it Size may reach high and
>> we would need to add new Node.
>>
>> Could you guys please suggest any idea?
>
> I'm assuming SolrCloud, because you said "collection" and "replication
> factor" which are SolrCloud concepts.
>
> As soon as you start the new node pointing at your zookeeper ensemble,
> it will be part of the cluster and will accept requests for any
> collection in the cluster.  No index data will end up on the new node
> until you take action with the Collections API, though.
>
> One way to put data on the new node is the ADDREPLICA action.  Another
> is to create a brand new collection with the shard and replication
> characteristics you want, and use the new collection instead of the old
> one, or create an alias to use whatever name you like.  You can use
> SPLITSHARD and then ADDREPLICA/DELETEREPLICA to put *some* of the data
> from an existing collection on the new node.
>
> https://cwiki.apache.org/confluence/display/solr/Collections+API
>
> I think the way I would proceed is to create a brand new collection set
> up with the correct number of shards and replicas to use the new node,
> populate that collection, delete the old collection, and set up a
> collection alias so that the new collection can be accessed with the old
> collection's name.
>
> Thanks,
> Shawn
>


Re: IndexFormatTooNewException - MapReduceIndexerTool for PDF files

2017-05-02 Thread Shawn Heisey
On 5/1/2017 10:48 PM, ecos wrote:
> The cause of the error is:
> org.apache.lucene.index.IndexFormatTooNewException: Format version is not
> supported (resource: BufferedChecksumIndexInput (segments_1)): 4 (needs to
> be between 0 and 3).
>
> Reading out there I found the exception is thrown when Lucene detects an
> index that is newer that the Lucene version.

> In order to index I´m following the tutorial:  
> 
>  

Normally my guess would be that the Lucene/Solr version in the mapreduce
tool is at least 5.0, which would produce an index that Solr 4.10.3
cannot read.

I have located information saying that the Cloudera 5.8 version includes
Solr 4.10.3, which makes me wonder if maybe the Solr version that is
trying to read the index is perhaps older than 4.10.3.

You might need to go to Cloudera for support on this problem, because
you are running their software.

Thanks,
Shawn



Re: Clean checkbox on DIH

2017-05-02 Thread Shawn Heisey
On 5/2/2017 6:53 AM, Mahmoud Almokadem wrote:
> And for the dataimport I always use the old UI cause the new UI
> doesn't show the live update and sometimes doesn't show the
> configuration. I think there are many bugs on the new UI. 

Do you know if these problems have been reported in the Jira issue
tracker?  The old UI is going to disappear in Solr 7.0 when it is
released.  If there are bugs in the new UI, we need to have them
reported so they can be fixed.

As I stated earlier, when it comes to DIH, the admin UI is more useful
for testing and research than actual usage.  The URLs for the admin UI
cannot be used in automation tools -- the API must be used directly.

Thanks,
Shawn



Re: Suggester uses lots of 'Page cache' memory

2017-05-02 Thread Shawn Heisey
On 5/1/2017 10:52 PM, Damien Kamerman wrote:
> I have a Solr v6.4.2 collection with 12 shards and 2 replicas. Each
> replica uses about 14GB disk usage. I'm using Solaris 11 and I see the
> 'Page cache' grow by about 7GB for each suggester replica I build. The
> suggester index itself is very small. The 'Page cache' memory is freed
> when the node is stopped. I guess the Suggester component is mmap'ing
> the entire Lucene index into memory and holding it? Is this expected
> behavior? Is there a workaround? 

I found the following.  The last comment on the answer, the one about
mmap causing double-buffering with ZFS, is possibly relevant:

https://serverfault.com/a/270604

What filesystem are your indexes on?  If it's ZFS, it could completely
explain the behavior.  If it's not ZFS, then the only part of it that I
cannot explain is the fact that the page cache is freed when Solr stops.

If this double-buffering actually means that the memory is allocated
twice, then I think that ZFS is probably the wrong filesystem to run
Solr on, unless you have a LOT of spare memory.  You could try changing
the directory factory to one that doesn't use MMAP, but the suggester
index factory probably cannot be easily changed.  This is too bad --
normally MMAP is far more efficient than "standard" filesystem access.

I could be reaching completely wrong conclusions based on the limited
research I did.

Thanks,
Shawn



Re: Clean checkbox on DIH

2017-05-02 Thread Mahmoud Almokadem
Thanks Shawn for your clarifications,

I think showing a confirmation message saying that "The whole index will be
cleaned" when the clean option is checked will be good.

I always remove the check from the file
/opt/solr/server/solr-webapp/webapp/tpl/dataimport.html after installing
solr but when I upgraded it this time forget to do that and press Execute
with the check and the whole index cleaned.

And for the dataimport I always use the old UI cause the new UI doesn't
show the live update and sometimes doesn't show the configuration. I think
there are many bugs on the new UI.

Thanks,
Mahmoud

On Mon, May 1, 2017 at 4:30 PM, Shawn Heisey  wrote:

> On 4/28/2017 9:01 AM, Mahmoud Almokadem wrote:
> > We already using a shell scripts to do our import and using fullimport
> > command to do our delta import and everything is doing well several
> > years ago. But default of the UI is full import with clean and commit.
> > If I press the Execute button by mistake the whole index is cleaned
> > without any notification.
>
> I understand your frustration.  What I'm worried about is the fallout if
> we change the default to be unchecked, from people who didn't verify the
> setting and expected full-import to wipe their index before it started
> importing, just like it has always done for the last few years.
>
> The default value for the clean parameter when NOT using the admin UI is
> true for full-import, and false for delta-import.  That's not going to
> change.  I firmly believe that the admin UI should have the same
> defaults as the API itself.  The very nature of a full-import carries
> the implication that you want to start over with an empty index.
>
> What if there were some bright red text in the UI near the execute
> button that urged you to double-check that the "clean" box has the
> setting you want?  An alternate idea would be to pop up a yes/no
> verification dialog on execute when the clean box is checked.
>
> Thanks,
> Shawn
>
>


Re: IndexFormatTooNewException - MapReduceIndexerTool for PDF files

2017-05-02 Thread ravi432
Hi ecos,

Is it giving solr documents when running mapreduce indexer tool with debug
mode.
if not can you run it with debug mode and send out any error.
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/IndexFormatTooNewException-MapReduceIndexerTool-for-PDF-files-tp4332881p4332935.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Add new Solr Node to existing Solr setup

2017-05-02 Thread Shawn Heisey
On 5/2/2017 4:24 AM, Venkateswarlu Bommineni wrote:
> We have Solr setup with below configuration.
> 
> 1) 1 collection with one shard
> 2)  4 Solr Nodes
> 2)  and replication factor 4 with one replication to each Solr Node.
> 
> as of now, it's working fine.But going forward it Size may reach high and
> we would need to add new Node.
> 
> Could you guys please suggest any idea?

I'm assuming SolrCloud, because you said "collection" and "replication
factor" which are SolrCloud concepts.

As soon as you start the new node pointing at your zookeeper ensemble,
it will be part of the cluster and will accept requests for any
collection in the cluster.  No index data will end up on the new node
until you take action with the Collections API, though.

One way to put data on the new node is the ADDREPLICA action.  Another
is to create a brand new collection with the shard and replication
characteristics you want, and use the new collection instead of the old
one, or create an alias to use whatever name you like.  You can use
SPLITSHARD and then ADDREPLICA/DELETEREPLICA to put *some* of the data
from an existing collection on the new node.

https://cwiki.apache.org/confluence/display/solr/Collections+API

I think the way I would proceed is to create a brand new collection set
up with the correct number of shards and replicas to use the new node,
populate that collection, delete the old collection, and set up a
collection alias so that the new collection can be accessed with the old
collection's name.

Thanks,
Shawn



Add new Solr Node to existing Solr setup

2017-05-02 Thread Venkateswarlu Bommineni
Hello Team,

We have Solr setup with below configuration.

1) 1 collection with one shard
2)  4 Solr Nodes
2)  and replication factor 4 with one replication to each Solr Node.

as of now, it's working fine.But going forward it Size may reach high and
we would need to add new Node.

Could you guys please suggest any idea?


Thanks,
Venkat.


Re: Suggester uses lots of 'Page cache' memory

2017-05-02 Thread Shalin Shekhar Mangar
On Tue, May 2, 2017 at 10:22 AM, Damien Kamerman  wrote:
> Hi all,
>
> I have a Solr v6.4.2 collection with 12 shards and 2 replicas. Each replica
> uses about 14GB disk usage. I'm using Solaris 11 and I see the 'Page cache'
> grow by about 7GB for each suggester replica I build. The suggester index
> itself is very small. The 'Page cache' memory is freed when the node is
> stopped.
>
> I guess the Suggester component is mmap'ing the entire Lucene index into
> memory and holding it? Is this expected behavior? Is there a workaround?
>

Yes, this is expected. The suggester opens the index using Lucene's
MMapDirectory which ends up memory mapping the index. But the memory
mapped pages can be shared across everyone using the same index which
basically means that the replica's usual index searcher can also use
these pages. If you were not building the suggester index, even then
the replica's search index would have been mmapped but perhaps only on
demand instead of all at once.

> I use this command to build the suggester for just the replica
> 'target1_shard1_replica1':
> curl "
> http://localhost:8983/solr/collection1/suggest?suggest.dictionary=mySuggester=true=localhost:8983/solr/target1_shard1_replica1
> "
>
> BTW: Without the 'shards' param the distributed request will randomly hit
> half the replicas.

Yes, this is a problem. I recently opened
https://issues.apache.org/jira/browse/SOLR-10532 but unfortunately I
don't have the time to fix it soon. Patches are always welcome.

>
> From my solrconfig.xml:
> 
> 
> mySuggester
> AnalyzingInfixLookupFactory
> mySuggester
> DocumentDictionaryFactory
> mySuggest
> x
> suggestTypeLc
> false
> 
> 
>
> Cheers,
> Damien.



-- 
Regards,
Shalin Shekhar Mangar.


Re: CDCR with SSL enabled

2017-05-02 Thread Xie, Sean
From the QUEUE action, the output is:


0
0


34741356
2
stopped




On 5/2/17, 1:43 AM, "Xie, Sean"  wrote:

Does CDCR support SSL encrypted SolrCloud?

I have two clusters started with SSL, and CDCR setup instruction is 
followed on source and target. However, from the solr.log, I’m not able to see 
CDCR is occurring. Not sure what has been setup incorrectly.

From the solr.log, I can’t find useful info related CDCR during the 
indexing time. Any help on how to probe the issue is appreciated.

The Target config:

  

  disabled

  

  


  

  

  cdcr-processor-chain

  

  

  
  ${solr.ulog.dir:}
  

  

The Source config:
  

  zk_ip:2181
  SourceCollection 
  TargetCollection 


  8
  1000
  128


  1000

  

  

  
  ${solr.ulog.dir:}
  
  


Confidentiality Notice::  This email, including attachments, may include 
non-public, proprietary, confidential or legally privileged information.  If 
you are not an intended recipient or an authorized agent of an intended 
recipient, you are hereby notified that any dissemination, distribution or 
copying of the information contained in or transmitted with this e-mail is 
unauthorized and strictly prohibited.  If you have received this email in 
error, please notify the sender by replying to this message and permanently 
delete this e-mail, its attachments, and any copies of it immediately.  You 
should not retain, copy or use this e-mail or any attachment for any purpose, 
nor disclose all or any part of the contents to any other person. Thank you.