Bringing Old Collections Up Again

2016-05-02 Thread Salman Ansari
Hi,

I am hosting Zookeeper ensemble and Solr servers on Microsoft cloud
(Azure). From time to time machines are forced to restart to install
updates. Recently, this happened again and it caused Zookeeper ensemble and
Solr instances to go down. When the machines came back up again. I tried
the following

1) Started Zookeeper on all machines using the following command
zkServer.cmd (on all three machines)

2) Started Solr on two of those machines using

solr.cmd start -c -p 8983 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 8983 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"

After several trials, it did start Solr on both machines but *non of the
previous collections came back normally.* When I look at the admin page, it
shows errors as follows

*[Collection_name]_shard2_replica2:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Index locked for write for core '[Collection_name]_shard2_replica2'. Solr
now longer supports forceful unlocking via 'unlockOnStartup'. Please verify
locks manually!

So probably I am doing something wrong or there is a different way of
bringing old collections up.

Appreciate your comments/feedback regarding this.

Regards,
Salman


Re: Problem in Issuing a Command to Upload Configuration

2016-05-02 Thread Salman Ansari
Well, that just happened! Solr and Zookeeper machines faced a forced
restart to install Windows Updates. This caused Zookeeper ensemble and Solr
instances to go down. When the machines came back up again. I tried the
following

1) Started Zookeeper on all machines using the following command
zkServer.cmd (on all three machines)

2) Started Solr on two of those machines using

solr.cmd start -c -p 8983 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 8983 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server1_name] -z
"[server1_ip]:2181,[server2_name]:2181,[server3_name]:2181"
solr.cmd start -c -p 7574 -h [server2_name] -z
"[server2_ip]:2181,[server1_name]:2181,[server3_name]:2181"

After several trials, it did start the solr on both machines but *non of
the previous collections came back normally.* When I look at the admin
page, it shows errors as follows

*[Collection_name]_shard2_replica2:*
org.apache.solr.common.SolrException:org.apache.solr.common.SolrException:
Index locked for write for core '[Collection_name]_shard2_replica2'. Solr
now longer supports forceful unlocking via 'unlockOnStartup'. Please verify
locks manually!

So probably I am doing something wrong or the simple scenario is not
straight forward to recover from.

Your comment/feedback is appreciated.

Regards,
Salman



On Thu, Apr 7, 2016 at 3:56 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 4/7/2016 5:40 AM, Salman Ansari wrote:
> > Any comments regarding the issue I mentioned above "the proper procedure
> of
> > bringing old collections up after a restart of zookeeper ensemble and
> Solr
> > instances"?
>
> What precisely do you mean by "old collections"?  The simplest
> interpretation of that is that you are trying to restart your servers
> and have everything you already had in the cloud work properly.  An
> alternate interpretation, which might be just as valid, is that you have
> some collections on some old servers that you want to incorporate into a
> new cloud.
>
> If it's the simple scenario: shut down solr, shut down zookeeper, start
> zookeeper, start solr.  If it's the other scenario, that is not quite so
> simple.
>
> Thanks,
> Shawn
>
>


Re: Distributing Collections across Shards

2016-03-30 Thread Salman Ansari
Thanks Erick for the help. Appreciate it.

Regards,
Salman

On Wed, Mar 30, 2016 at 7:29 AM, Erick Erickson <erickerick...@gmail.com>
wrote:

> Absolutely. You haven't said which version of Solr you're using,
> but there are several possibilities:
> 1> create the collection with replicationFactor=1, then use the
> ADDREPLICA command to specify exactly what node the  replicas
> for each shard are created on with the 'node' parameter.
> 2> For recent versions of Solr, you can create a collection with _no_
> replicas and then ADDREPLICA as you choose.
>
> Best,
> Erick
>
> On Tue, Mar 29, 2016 at 5:10 AM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Hi,
> >
> > I believe the default behavior of creating collections distributed across
> > shards through the following command
> >
> > http://
> >
> [solrlocation]:8983/solr/admin/collections?action=CREATE=[collection_name]=2=2=2=[configuration_name]
> >
> > is that Solr will create the collection as follows
> >
> > *shard1: *leader in server1 and replica in server2
> > *shard2:* leader in server2 and replica in server1
> >
> > However, I have seen cases when running the above command that it creates
> > both the leader and replica on the same server.
> >
> > Wondering if there is a way to control this behavior (I mean control
> where
> > the leader and the replica of each shard will reside)?
> >
> > Regards,
> > Salman
>


Distributing Collections across Shards

2016-03-29 Thread Salman Ansari
Hi,

I believe the default behavior of creating collections distributed across
shards through the following command

http://
[solrlocation]:8983/solr/admin/collections?action=CREATE=[collection_name]=2=2=2=[configuration_name]

is that Solr will create the collection as follows

*shard1: *leader in server1 and replica in server2
*shard2:* leader in server2 and replica in server1

However, I have seen cases when running the above command that it creates
both the leader and replica on the same server.

Wondering if there is a way to control this behavior (I mean control where
the leader and the replica of each shard will reside)?

Regards,
Salman


Re: Problem in Issuing a Command to Upload Configuration

2016-03-29 Thread Salman Ansari
Moreover, I have created those new collections as a work around as my past
collections were not coming up after a complete restart for machines
hosting zookeepers and Solr. I would be interested to know what is the
proper procedure of bringing old collections up after a restart of
zookeeper ensemble and Solr instances.

Appreciate any feedback and comments.

Regards,
Salman


On Tue, Mar 29, 2016 at 11:53 AM, Salman Ansari <salman.rah...@gmail.com>
wrote:

> Thanks Reth for your response. It did work.
>
> Regards,
> Salman
>
> On Mon, Mar 28, 2016 at 8:01 PM, Reth RM <reth.ik...@gmail.com> wrote:
>
>> I think it should be "zkcli.bat" (all in lower case) that is shipped with
>> solr not zkCli.cmd(that is shipped with zookeeper)
>>
>> solr_home/server/scripts/cloud-scripts/zkcli.bat -zkhost 127.0.0.1:9983 \
>>-cmd upconfig -confname my_new_config -confdir
>> server/solr/configsets/basic_configs/conf
>>
>> On Mon, Mar 28, 2016 at 8:18 PM, Salman Ansari <salman.rah...@gmail.com>
>> wrote:
>>
>> > Hi,
>> >
>> > I am facing issue uploading configuration to Zookeeper ensemble. I am
>> > running this on Windows as
>> >
>> > *Command*
>> > **
>> > zkCli.cmd -cmd upconfig -zkhost
>> > "[localserver]:2181,[second_server]:2181,[third_server]:2181" -confname
>> > [config_name]  -confdir "[config_dir]"
>> >
>> > and I got the following result
>> >
>> > *Result*
>> > =
>> > Connecting to localhost:2181
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:host.name=SabrSolrServer1.SabrSolrServer1.a2.internal.cloudapp.net
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.version=1.8.0_77
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.vendor=Oracle Corporation
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.home=C:\Program Files\Java\jre1.8.0_77
>> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> >
>> >
>> ent:java.class.path=C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\build\classes;C:\So
>> >
>> >
>> lr\Zookeeper\zookeeper-3.4.6\bin\..\build\lib\*;C:\Solr\Zookeeper\zookeeper-3.4.
>> >
>> >
>> 6\bin\..\zookeeper-3.4.6.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\jline-
>> >
>> >
>> 0.9.94.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\log4j-1.2.16.jar;C:\Solr
>> >
>> >
>> \Zookeeper\zookeeper-3.4.6\bin\..\lib\netty-3.7.0.Final.jar;C:\Solr\Zookeeper\zo
>> >
>> >
>> okeeper-3.4.6\bin\..\lib\slf4j-api-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\b
>> >
>> >
>> in\..\lib\slf4j-log4j12-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\conf
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> >
>> >
>> ent:java.library.path=C:\ProgramData\Oracle\Java\javapath;C:\Windows\Sun\Java\bi
>> >
>> >
>> n;C:\Windows\system32;C:\Windows;C:\ProgramData\Oracle\Java\javapath;C:\Windows\
>> >
>> >
>> system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
>> > ll\v1.0\;C:\Program Files\Java\JDK\bin;.
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.io.tmpdir=C:\Users\ADMIN_~1\AppData\Local\Temp\2\
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:java.compiler=
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:os.name=Windows Server 2012 R2
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:os.arch=amd64
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:os.version=6.3
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:user.name=admin_user
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:user.home=C:\Users\admin_user
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
>> > environm
>> > ent:user.dir=C:\Solr\Zookeeper\zookeeper-3.4.6\bin
>> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:ZooKeeper@438] -
>> Initiating
>> > client
>> >  connection, connectString=localhost:2181 sessionTimeout=3
>> > watcher=org.apach
>> > e.zookeeper.ZooKeeperMain$MyWatcher@506c589e
>> >
>> > It looks like that it is not even calling the command. Any idea why is
>> that
>> > happening?
>> >
>> > Regards,
>> > Salman
>> >
>>
>
>


Re: Problem in Issuing a Command to Upload Configuration

2016-03-29 Thread Salman Ansari
Thanks Reth for your response. It did work.

Regards,
Salman

On Mon, Mar 28, 2016 at 8:01 PM, Reth RM <reth.ik...@gmail.com> wrote:

> I think it should be "zkcli.bat" (all in lower case) that is shipped with
> solr not zkCli.cmd(that is shipped with zookeeper)
>
> solr_home/server/scripts/cloud-scripts/zkcli.bat -zkhost 127.0.0.1:9983 \
>-cmd upconfig -confname my_new_config -confdir
> server/solr/configsets/basic_configs/conf
>
> On Mon, Mar 28, 2016 at 8:18 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am facing issue uploading configuration to Zookeeper ensemble. I am
> > running this on Windows as
> >
> > *Command*
> > **
> > zkCli.cmd -cmd upconfig -zkhost
> > "[localserver]:2181,[second_server]:2181,[third_server]:2181" -confname
> > [config_name]  -confdir "[config_dir]"
> >
> > and I got the following result
> >
> > *Result*
> > =
> > Connecting to localhost:2181
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:host.name=SabrSolrServer1.SabrSolrServer1.a2.internal.cloudapp.net
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.version=1.8.0_77
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.vendor=Oracle Corporation
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.home=C:\Program Files\Java\jre1.8.0_77
> > 2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> >
> >
> ent:java.class.path=C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\build\classes;C:\So
> >
> >
> lr\Zookeeper\zookeeper-3.4.6\bin\..\build\lib\*;C:\Solr\Zookeeper\zookeeper-3.4.
> >
> >
> 6\bin\..\zookeeper-3.4.6.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\jline-
> >
> >
> 0.9.94.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\log4j-1.2.16.jar;C:\Solr
> >
> >
> \Zookeeper\zookeeper-3.4.6\bin\..\lib\netty-3.7.0.Final.jar;C:\Solr\Zookeeper\zo
> >
> >
> okeeper-3.4.6\bin\..\lib\slf4j-api-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\b
> >
> >
> in\..\lib\slf4j-log4j12-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\conf
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> >
> >
> ent:java.library.path=C:\ProgramData\Oracle\Java\javapath;C:\Windows\Sun\Java\bi
> >
> >
> n;C:\Windows\system32;C:\Windows;C:\ProgramData\Oracle\Java\javapath;C:\Windows\
> >
> >
> system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
> > ll\v1.0\;C:\Program Files\Java\JDK\bin;.
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.io.tmpdir=C:\Users\ADMIN_~1\AppData\Local\Temp\2\
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:java.compiler=
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:os.name=Windows Server 2012 R2
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:os.arch=amd64
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:os.version=6.3
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:user.name=admin_user
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:user.home=C:\Users\admin_user
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
> > environm
> > ent:user.dir=C:\Solr\Zookeeper\zookeeper-3.4.6\bin
> > 2016-03-28 14:40:12,865 [myid:] - INFO  [main:ZooKeeper@438] -
> Initiating
> > client
> >  connection, connectString=localhost:2181 sessionTimeout=3
> > watcher=org.apach
> > e.zookeeper.ZooKeeperMain$MyWatcher@506c589e
> >
> > It looks like that it is not even calling the command. Any idea why is
> that
> > happening?
> >
> > Regards,
> > Salman
> >
>


Problem in Issuing a Command to Upload Configuration

2016-03-28 Thread Salman Ansari
Hi,

I am facing issue uploading configuration to Zookeeper ensemble. I am
running this on Windows as

*Command*
**
zkCli.cmd -cmd upconfig -zkhost
"[localserver]:2181,[second_server]:2181,[third_server]:2181" -confname
[config_name]  -confdir "[config_dir]"

and I got the following result

*Result*
=
Connecting to localhost:2181
2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:host.name=SabrSolrServer1.SabrSolrServer1.a2.internal.cloudapp.net
2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:java.version=1.8.0_77
2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:java.vendor=Oracle Corporation
2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:java.home=C:\Program Files\Java\jre1.8.0_77
2016-03-28 14:40:12,849 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:java.class.path=C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\build\classes;C:\So
lr\Zookeeper\zookeeper-3.4.6\bin\..\build\lib\*;C:\Solr\Zookeeper\zookeeper-3.4.
6\bin\..\zookeeper-3.4.6.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\jline-
0.9.94.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\lib\log4j-1.2.16.jar;C:\Solr
\Zookeeper\zookeeper-3.4.6\bin\..\lib\netty-3.7.0.Final.jar;C:\Solr\Zookeeper\zo
okeeper-3.4.6\bin\..\lib\slf4j-api-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\b
in\..\lib\slf4j-log4j12-1.6.1.jar;C:\Solr\Zookeeper\zookeeper-3.4.6\bin\..\conf
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:java.library.path=C:\ProgramData\Oracle\Java\javapath;C:\Windows\Sun\Java\bi
n;C:\Windows\system32;C:\Windows;C:\ProgramData\Oracle\Java\javapath;C:\Windows\
system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShe
ll\v1.0\;C:\Program Files\Java\JDK\bin;.
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:java.io.tmpdir=C:\Users\ADMIN_~1\AppData\Local\Temp\2\
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:java.compiler=
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:os.name=Windows Server 2012 R2
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:os.arch=amd64
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:os.version=6.3
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:user.name=admin_user
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:user.home=C:\Users\admin_user
2016-03-28 14:40:12,865 [myid:] - INFO  [main:Environment@100] - Client
environm
ent:user.dir=C:\Solr\Zookeeper\zookeeper-3.4.6\bin
2016-03-28 14:40:12,865 [myid:] - INFO  [main:ZooKeeper@438] - Initiating
client
 connection, connectString=localhost:2181 sessionTimeout=3
watcher=org.apach
e.zookeeper.ZooKeeperMain$MyWatcher@506c589e

It looks like that it is not even calling the command. Any idea why is that
happening?

Regards,
Salman


Issue With Manual Lock

2016-03-23 Thread Salman Ansari
Hi,

I am facing an issue which I believe has something to do with recent
changes in Solr. I already have a collection spread on 2 shards (each with
2 replicas).  What happened is that my Solr and Zookeeper ensemble went
down and I restarted the servers. I have performed the following steps

1) I restarted the machine and performed Windows update
2) I started Zookeeper ensemble
3) Then I started Solr instances

My issues are (for collections which existed before starting Solr servers)

1) From time to time, I see some replicas are down on Solr dashboard
2) When I try to index some documents, I faced the following exception

SolrNet.Exceptions.SolrConnectionException was unhandled by user code

  HResult=-2146232832

  Message=


5001021{msg=SolrCore '[myCollection]_shard1_replica1' is not available
due to init failure: Index locked for write for core
'[myCollection]_shard1_replica1'. Solr now longer supports forceful
unlocking via 'unlockOnStartup'. Please verify locks
manually!,trace=org.apache.solr.common.SolrException: SolrCore
'[myCollection]_shard1_replica1' is not available due to init failure:
Index locked for write for core '[myCollection]_shard1_replica1'. Solr now
longer supports forceful unlocking via 'unlockOnStartup'. Please verify
locks manually!??

I have tried several steps including

1) I have removed write.lock file manually from the folders while Solr is
up and I have tried reloading the core while the Solr is up as well but
nothing changed (still some replicas are down)
2) I have restarted Solr instances but now all replicas are down :)

Any idea how to handle this issue?

Appreciate your comments/feedback.

Regards,
Salman


Issue Running Solr

2016-03-21 Thread Salman Ansari
Hi,

I am facing an issue in running Solr server. I tried different approaches
and still receive the following error

"ERROR: Solr at http://localhost:8983/solr did not come online within 30
seconds"

I tried running the following commands

1) solr -e cloud
2) solr.cmd start -cloud -p 8983 -s
"C:\Solr\Solr-5.3.1\solr-5.3.1\example\cloud\node1" -h [myserver] -z
"[server_ip]:2181,[server2_hostname]:2181,[server3_hostname]:2181"

I tried running the commands multiple times but still get the same result.

Are there any possible reasons why I am receiving this multiple times? Any
possible solutions?

Note: I tried this after a Windows update and restarting.

Regards,
Salman


Re: Querying data based on field type

2016-02-18 Thread Salman Ansari
Not sure if I am getting this but I am not interested in updating
documents. I am interested in getting documents that has the field type of
a specific field as array .

Regards,
Salman

On Thu, Feb 18, 2016 at 11:13 AM, Binoy Dalal <binoydala...@gmail.com>
wrote:

> Take a look at atomic updates and remove regex.
>
> https://cwiki.apache.org/confluence/display/solr/Updating+Parts+of+Documents
>
> On Thu, 18 Feb 2016, 13:07 Salman Ansari <salman.rah...@gmail.com> wrote:
>
> > Hi,
> >
> > Due to some mis-configuration issues, I have a field that has values as
> > single string and an array of strings. Looks like there are some old
> values
> > that got indexed as an array of strings while anything new are single
> > valued string. I have checked the configuration and multivalued for that
> > field is set to false. What I want is to remove all the occurrences of
> the
> > field as an array (multi-valued) where it shows as  instead of
> > . Is there a way to query the field so it returns only those
> > documents that have field as an array and not as a single string?
> >
> > Appreciate your comments/feedback.
> >
> > Regards,
> > Salman
> >
> --
> Regards,
> Binoy Dalal
>


Querying data based on field type

2016-02-17 Thread Salman Ansari
Hi,

Due to some mis-configuration issues, I have a field that has values as
single string and an array of strings. Looks like there are some old values
that got indexed as an array of strings while anything new are single
valued string. I have checked the configuration and multivalued for that
field is set to false. What I want is to remove all the occurrences of the
field as an array (multi-valued) where it shows as  instead of
. Is there a way to query the field so it returns only those
documents that have field as an array and not as a single string?

Appreciate your comments/feedback.

Regards,
Salman


Re: Negating multiple array fileds

2016-02-17 Thread Salman Ansari
Thanks Shawn for explaining in details.
Regarding the performance issue you mentioned, there are 2 points

1) "The [* TO *] syntax is an all-inclusive range query, which will usually be
much faster than a wildcard query."

I will take your statement for granted and let the space for people to
comment on the details behind this.

2) "Behind the scenes, Solr will interpret this as "all possible values for
field" --which sounds like it would be exactly what you're looking for,
except that if there are ten million possible values in the field
you're searching,
the constructed Lucene query will quite literally include all ten million
values."

Does that mean that the  [* TO *] syntax does not return all results?

Regards,

Salman
On Feb 17, 2016 6:29 AM, "Binoy Dalal"  wrote:

> Hi Shawn,
> Please correct me If I'm wrong here, but don't the all inclusive range
> query [* TO *] and an only wildcard query like the one above essentially do
> the same thing from a black box perspective?
> In such a case wouldn't it be better to default an only wildcard query to
> an all inclusive range query?
>
> On Wed, 17 Feb 2016, 06:47 Shawn Heisey  wrote:
>
> > On 2/15/2016 9:22 AM, Jack Krupansky wrote:
> > > I should also have noted that your full query:
> > >
> > > (-persons:*)AND(-places:*)AND(-orgs:*)
> > >
> > > can be written as:
> > >
> > > -persons:* -places:* -orgs:*
> > >
> > > Which may work as is, or can also be written as:
> > >
> > > *:* -persons:* -places:* -orgs:*
> >
> > Salman,
> >
> > One fact of Lucene operation is that purely negative queries do not
> > work.  A negative query clause is like a subtraction.  If you make a
> > query that only says "subtract these values", then you aren't going to
> > get anything, because you did not start with anything.
> >
> > Adding the "*:*" clause at the beginning of the query says "start with
> > everything."
> >
> > You might ask why a query of -field:value works, when I just said that
> > it *won't* work.  This is because Solr has detected the problem and
> > fixed it.  When the query is very simple (a single negated clause), Solr
> > is able to detect the unworkable situation and implicitly add the "*:*"
> > starting point, producing the expected results.  With more complex
> > queries, like the one you are trying, this detection fails, and the
> > query is executed as-is.
> >
> > Jack is an awesome member of this community.  I do not want to disparage
> > him at all when I tell you that the rewritten query he provided will
> > work, but is not optimal.  It can be optimized as the following:
> >
> > *:* -persons:[* TO *] -places:[* TO *] -orgs:[* TO *]
> >
> > A query clause of the format "field:*" is a wildcard query.  Behind the
> > scenes, Solr will interpret this as "all possible values for field" --
> > which sounds like it would be exactly what you're looking for, except
> > that if there are ten million possible values in the field you're
> > searching, the constructed Lucene query will quite literally include all
> > ten million values.  Wildcard queries tend to use a lot of memory and
> > run slowly.
> >
> > The [* TO *] syntax is an all-inclusive range query, which will usually
> > be much faster than a wildcard query.
> >
> > Thanks,
> > Shawn
> >
> > --
> Regards,
> Binoy Dalal
>


Re: Negating multiple array fileds

2016-02-14 Thread Salman Ansari
@Binoy: The query does work but for one term (-persons:[* TO *]) but it
does not work for multiple terms such as
http://[Myserver]/solr/[Collection]/select?q=(-persons:[* TO *])AND(-orgs:[*
TO *])
This returns zero records although I do have records that has both persons
and orgs empty.

@Jack: Replacing (-persons:*)AND(-orgs:*) with (*:* -persons:*)AND(*:*
-orgs:*) did the trick. Thanks.

Thanks you both for your comments.

Salman

On Sun, Feb 14, 2016 at 7:51 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Due to a bug (or poorly designed feature), you need to explicitly include a
> non-negative query term in a purely negative sub-query. Usually this means
> using *:* to select all documents. Note that the use of parentheses
> introduces a sub-query. So, (-persons:*) s.b. (*:* -persons:*).
>
> -- Jack Krupansky
>
> On Sun, Feb 14, 2016 at 8:21 AM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I think what I am asking should be easy to do but for some reasons I am
> > facing issues in making that happen. The issue is that I want
> > include/exclude some fields from my Solr query. All the fields that I
> need
> > to include are multi valued int fields. When I include the fields I have
> > the following query
> >
> > http://
> >
> >
> [MySolrServer]/solr/[Collection]/select?q=(persons:*)AND(places:*)AND(orgs:*)
> > This does return the desired result. However, when I negate the values
> >
> > http://
> >
> >
> [MySolrServer]/solr/[Collection]/select?q=(-persons:*)AND(-places:*)AND(-orgs:*)
> > This returns 0 documents although there are a lot of documents that have
> > all those fields empty.
> >
> > Any ideas why this is happening?
> >
> > Appreciate any comments/feedback.
> >
> > Regards,
> > Salman
> >
>


Negating multiple array fileds

2016-02-14 Thread Salman Ansari
Hi,

I think what I am asking should be easy to do but for some reasons I am
facing issues in making that happen. The issue is that I want
include/exclude some fields from my Solr query. All the fields that I need
to include are multi valued int fields. When I include the fields I have
the following query

http://
[MySolrServer]/solr/[Collection]/select?q=(persons:*)AND(places:*)AND(orgs:*)
This does return the desired result. However, when I negate the values

http://
[MySolrServer]/solr/[Collection]/select?q=(-persons:*)AND(-places:*)AND(-orgs:*)
This returns 0 documents although there are a lot of documents that have
all those fields empty.

Any ideas why this is happening?

Appreciate any comments/feedback.

Regards,
Salman


Re: URI is too long

2016-02-06 Thread Salman Ansari
It looked like there was another issue with my query. I had too many
boolean operators (I believe maxBooleanClause property in SolrConfig.xml).
I just looped in batch of 1000 to get all the docs. Not sure if there is a
better way of handling this.

Regards,
Salman


On Wed, Feb 3, 2016 at 12:29 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 2/2/2016 1:46 PM, Salman Ansari wrote:
> > OK then, if there is no way around this problem, can someone tell me the
> > maximum size a POST body can handle in Solr?
>
> It is configurable in solrconfig.xml.  Look for the
> formdataUploadLimitInKB setting in the 5.x configsets.  This setting
> defaults to 2048, which means 2 megabytes.
>
> Thanks,
> Shawn
>
>


Re: URI is too long

2016-02-02 Thread Salman Ansari
OK then, if there is no way around this problem, can someone tell me the
maximum size a POST body can handle in Solr?

Regards,
Salman

On Tue, Feb 2, 2016 at 12:12 AM, Salman Ansari <salman.rah...@gmail.com>
wrote:

> That is what I have tried. I tried using POST with
> application/x-www-form-urlencoded and I got the exception I mentioned. Is
> there a way I can get around this exception?
>
> Regards,
> Salman
>
> On Mon, Feb 1, 2016 at 6:08 PM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
>> Post is pretty much similar to GET. You can use any REST Client to try.
>> Same select URL & pass below header and put the queries parameters into
>> body
>>
>> POST:  http://localhost:8983/solr/techproducts/select
>>
>> Header
>> ==
>> Content-Type:application/x-www-form-urlencoded
>>
>> payload/body:
>> ==
>> q=*:*=2
>>
>>
>> Thanks,
>> Susheel
>>
>> On Mon, Feb 1, 2016 at 2:38 AM, Salman Ansari <salman.rah...@gmail.com>
>> wrote:
>>
>> > Cool. I would give POST a try. Any samples of using Post while passing
>> the
>> > query string values (such as ORing between Solr field values) using
>> > Solr.NET?
>> >
>> > Regards,
>> > Salman
>> >
>> > On Sun, Jan 31, 2016 at 10:21 PM, Shawn Heisey <apa...@elyograg.org>
>> > wrote:
>> >
>> > > On 1/31/2016 7:20 AM, Salman Ansari wrote:
>> > > > I am building a long query containing multiple ORs between query
>> > terms. I
>> > > > started to receive the following exception:
>> > > >
>> > > > The remote server returned an error: (414) Request-URI Too Long. Any
>> > idea
>> > > > what is the limit of the URL in Solr? Moreover, as a solution I was
>> > > > thinking of chunking the query into multiple requests but I was
>> > wondering
>> > > > if anyone has a better approach?
>> > >
>> > > The default HTTP header size limit on most webservers and containers
>> > > (including the Jetty that ships with Solr) is 8192 bytes.  A typical
>> > > request like this will start with "GET " and end with " HTTP/1.1",
>> which
>> > > count against that 8192 bytes.  The max header size can be increased.
>> > >
>> > > If you place the parameters into a POST request instead of on the URL,
>> > > then the default size limit of that POST request in Solr is 2MB.  This
>> > > can also be increased.
>> > >
>> > > Thanks,
>> > > Shawn
>> > >
>> > >
>> >
>>
>
>


Re: URI is too long

2016-02-01 Thread Salman Ansari
That is what I have tried. I tried using POST with
application/x-www-form-urlencoded and I got the exception I mentioned. Is
there a way I can get around this exception?

Regards,
Salman

On Mon, Feb 1, 2016 at 6:08 PM, Susheel Kumar <susheel2...@gmail.com> wrote:

> Post is pretty much similar to GET. You can use any REST Client to try.
> Same select URL & pass below header and put the queries parameters into
> body
>
> POST:  http://localhost:8983/solr/techproducts/select
>
> Header
> ==
> Content-Type:application/x-www-form-urlencoded
>
> payload/body:
> ==
> q=*:*=2
>
>
> Thanks,
> Susheel
>
> On Mon, Feb 1, 2016 at 2:38 AM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Cool. I would give POST a try. Any samples of using Post while passing
> the
> > query string values (such as ORing between Solr field values) using
> > Solr.NET?
> >
> > Regards,
> > Salman
> >
> > On Sun, Jan 31, 2016 at 10:21 PM, Shawn Heisey <apa...@elyograg.org>
> > wrote:
> >
> > > On 1/31/2016 7:20 AM, Salman Ansari wrote:
> > > > I am building a long query containing multiple ORs between query
> > terms. I
> > > > started to receive the following exception:
> > > >
> > > > The remote server returned an error: (414) Request-URI Too Long. Any
> > idea
> > > > what is the limit of the URL in Solr? Moreover, as a solution I was
> > > > thinking of chunking the query into multiple requests but I was
> > wondering
> > > > if anyone has a better approach?
> > >
> > > The default HTTP header size limit on most webservers and containers
> > > (including the Jetty that ships with Solr) is 8192 bytes.  A typical
> > > request like this will start with "GET " and end with " HTTP/1.1",
> which
> > > count against that 8192 bytes.  The max header size can be increased.
> > >
> > > If you place the parameters into a POST request instead of on the URL,
> > > then the default size limit of that POST request in Solr is 2MB.  This
> > > can also be increased.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


Re: URI is too long

2016-02-01 Thread Salman Ansari
I tried using POST but faced an issue where I am still not able to send
long data. When I send data in the body that exceeds 35KB I get the
following exception:

"An exception of type 'SolrNet.Exceptions.SolrConnectionException' occurred
in [Myproject] but was not handled in user code

Additional information: The request was aborted: The connection was closed
unexpectedly."

Any ideas why this is happening and how to resolve this?

Regards,
Salman


On Mon, Feb 1, 2016 at 2:15 PM, Upayavira <u...@odoko.co.uk> wrote:

> POST is supposed (as defined by REST) to imply a request with
> side-effects. A query as such does not have side effects, so
> conceptually, it should be a GET. In practice, whilst it might cause
> some developers to grumble, using a POST for a request should make no
> difference to Solr (other than accepting a larger query).
>
> Upayavira
>
> On Mon, Feb 1, 2016, at 11:05 AM, Midas A wrote:
> > Is there any drawback of POST request and why we prefer GET.
> >
> > On Mon, Feb 1, 2016 at 1:08 PM, Salman Ansari <salman.rah...@gmail.com>
> > wrote:
> >
> > > Cool. I would give POST a try. Any samples of using Post while passing
> the
> > > query string values (such as ORing between Solr field values) using
> > > Solr.NET?
> > >
> > > Regards,
> > > Salman
> > >
> > > On Sun, Jan 31, 2016 at 10:21 PM, Shawn Heisey <apa...@elyograg.org>
> > > wrote:
> > >
> > > > On 1/31/2016 7:20 AM, Salman Ansari wrote:
> > > > > I am building a long query containing multiple ORs between query
> > > terms. I
> > > > > started to receive the following exception:
> > > > >
> > > > > The remote server returned an error: (414) Request-URI Too Long.
> Any
> > > idea
> > > > > what is the limit of the URL in Solr? Moreover, as a solution I was
> > > > > thinking of chunking the query into multiple requests but I was
> > > wondering
> > > > > if anyone has a better approach?
> > > >
> > > > The default HTTP header size limit on most webservers and containers
> > > > (including the Jetty that ships with Solr) is 8192 bytes.  A typical
> > > > request like this will start with "GET " and end with " HTTP/1.1",
> which
> > > > count against that 8192 bytes.  The max header size can be increased.
> > > >
> > > > If you place the parameters into a POST request instead of on the
> URL,
> > > > then the default size limit of that POST request in Solr is 2MB.
> This
> > > > can also be increased.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
>


Re: URI is too long

2016-01-31 Thread Salman Ansari
Cool. I would give POST a try. Any samples of using Post while passing the
query string values (such as ORing between Solr field values) using
Solr.NET?

Regards,
Salman

On Sun, Jan 31, 2016 at 10:21 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 1/31/2016 7:20 AM, Salman Ansari wrote:
> > I am building a long query containing multiple ORs between query terms. I
> > started to receive the following exception:
> >
> > The remote server returned an error: (414) Request-URI Too Long. Any idea
> > what is the limit of the URL in Solr? Moreover, as a solution I was
> > thinking of chunking the query into multiple requests but I was wondering
> > if anyone has a better approach?
>
> The default HTTP header size limit on most webservers and containers
> (including the Jetty that ships with Solr) is 8192 bytes.  A typical
> request like this will start with "GET " and end with " HTTP/1.1", which
> count against that 8192 bytes.  The max header size can be increased.
>
> If you place the parameters into a POST request instead of on the URL,
> then the default size limit of that POST request in Solr is 2MB.  This
> can also be increased.
>
> Thanks,
> Shawn
>
>


URI is too long

2016-01-31 Thread Salman Ansari
Hi,

I am building a long query containing multiple ORs between query terms. I
started to receive the following exception:

The remote server returned an error: (414) Request-URI Too Long. Any idea
what is the limit of the URL in Solr? Moreover, as a solution I was
thinking of chunking the query into multiple requests but I was wondering
if anyone has a better approach?

Regards,
Salman


Returning all documents in a collection

2016-01-20 Thread Salman Ansari
Hi,

I am looking for a way to return all documents from a collection.
Currently, I am restricted to specifying the number of rows using Solr.NET
but I am looking for a better approach to actually return all documents. If
I specify a huge number such as 1M, the processing takes a long time.

Any feedback/comment will be appreciated.

Regards,
Salman


Re: Returning all documents in a collection

2016-01-20 Thread Salman Ansari
Thanks Emir, Susheel and Jack for your responses. Just to update, I am
using Solr Cloud plus I want to get the data completely without pagination
or cursor (I mean in one shot). Is there a way to do this in Solr?

Regards,
Salman

On Wed, Jan 20, 2016 at 4:49 PM, Jack Krupansky <jack.krupan...@gmail.com>
wrote:

> Yes, Exporting Results Sets is the preferred and recommended technique for
> returning all documents in a collection, or even simply for queries that
> select a large number of documents, all of which are to be returned. It
> uses efficient streaming rather than paging.
>
> But... this great feature currently does not have support for
> distributed/SolrCloud mode:
> "The initial release treats all queries as non-distributed requests. So the
> client is responsible for making the calls to each Solr instance and
> merging the results.
> Using SolrJ’s CloudSolrClient as a model, developers could build clients
> that automatically send requests to all the shards in a collection (or
> multiple collections) and then merge the sorted sets any way they wish."
>
> -- Jack Krupansky
>
> On Wed, Jan 20, 2016 at 8:41 AM, Susheel Kumar <susheel2...@gmail.com>
> wrote:
>
> > Hello Salman,
> >
> > Please checkout the export functionality
> > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> >
> > Thanks,
> > Susheel
> >
> > On Wed, Jan 20, 2016 at 6:57 AM, Emir Arnautovic <
> > emir.arnauto...@sematext.com> wrote:
> >
> > > Hi Salman,
> > > You should use cursors in order to avoid "deep paging issues". Take a
> > look
> > > at
> > https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results.
> > >
> > > Regards,
> > > Emir
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > >
> > >
> > > On 20.01.2016 12:55, Salman Ansari wrote:
> > >
> > >> Hi,
> > >>
> > >> I am looking for a way to return all documents from a collection.
> > >> Currently, I am restricted to specifying the number of rows using
> > Solr.NET
> > >> but I am looking for a better approach to actually return all
> documents.
> > >> If
> > >> I specify a huge number such as 1M, the processing takes a long time.
> > >>
> > >> Any feedback/comment will be appreciated.
> > >>
> > >> Regards,
> > >> Salman
> > >>
> > >>
> > >
> >
>


Re: Changing Solr Schema with Data

2015-12-30 Thread Salman Ansari
Thanks.

Salman

On Tue, Dec 29, 2015 at 9:33 PM, Binoy Dalal <binoydala...@gmail.com> wrote:

> What shalin says is solid and will work with solr 5.x as well as 3.x
> You could do a little poc if you want to be absolutely certain. Shouldn't
> take you very long.
> Your only concern will be that your old docs won't be matched against
> queries matched against the newly added fields.
>
> On Tue, 29 Dec 2015, 23:38 Salman Ansari <salman.rah...@gmail.com> wrote:
>
> > Thanks guys for your responses.
> >
> > @Shalin: Do you have a documentation that explains this? Moreover, is it
> > only for Solr 5+ or is it still applicable to Solr 3+? I am asking this
> as
> > I am working in a team and in some of our projects we are using old Solr
> > versions and I need to convince the guys that this is possible in the old
> > Solr as well.
> >
> > Thanks for your help.
> >
> > Regards,
> > Salman
> >
> >
> > On Tue, Dec 29, 2015 at 9:44 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Adding new fields is not a problem. You can continue to use your
> > > existing index with the new schema.
> > >
> > > On Tue, Dec 29, 2015 at 1:58 AM, Salman Ansari <
> salman.rah...@gmail.com>
> > > wrote:
> > > > You can say that we are not removing any fields (so the old data
> should
> > > not
> > > > get affected), however, we need to add new fields (which new data
> will
> > > > have). Does that answer your question?
> > > >
> > > >
> > > > Regards,
> > > > Salman
> > > >
> > > > On Mon, Dec 28, 2015 at 9:58 PM, Alexandre Rafalovitch <
> > > arafa...@gmail.com>
> > > > wrote:
> > > >
> > > >> Is the schema change affects the data you want to keep?
> > > >> 
> > > >> Newsletter and resources for Solr beginners and intermediates:
> > > >> http://www.solr-start.com/
> > > >>
> > > >>
> > > >> On 29 December 2015 at 01:48, Salman Ansari <
> salman.rah...@gmail.com>
> > > >> wrote:
> > > >> > Hi,
> > > >> >
> > > >> > I am facing an issue where I need to change Solr schema but I have
> > > >> crucial
> > > >> > data that I don't want to delete. Is there a way where I can
> change
> > > the
> > > >> > schema of the index while keeping the data intact?
> > > >> >
> > > >> > Regards,
> > > >> > Salman
> > > >>
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
> --
> Regards,
> Binoy Dalal
>


Re: Changing Solr Schema with Data

2015-12-29 Thread Salman Ansari
Thanks guys for your responses.

@Shalin: Do you have a documentation that explains this? Moreover, is it
only for Solr 5+ or is it still applicable to Solr 3+? I am asking this as
I am working in a team and in some of our projects we are using old Solr
versions and I need to convince the guys that this is possible in the old
Solr as well.

Thanks for your help.

Regards,
Salman


On Tue, Dec 29, 2015 at 9:44 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Adding new fields is not a problem. You can continue to use your
> existing index with the new schema.
>
> On Tue, Dec 29, 2015 at 1:58 AM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > You can say that we are not removing any fields (so the old data should
> not
> > get affected), however, we need to add new fields (which new data will
> > have). Does that answer your question?
> >
> >
> > Regards,
> > Salman
> >
> > On Mon, Dec 28, 2015 at 9:58 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Is the schema change affects the data you want to keep?
> >> 
> >> Newsletter and resources for Solr beginners and intermediates:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 29 December 2015 at 01:48, Salman Ansari <salman.rah...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I am facing an issue where I need to change Solr schema but I have
> >> crucial
> >> > data that I don't want to delete. Is there a way where I can change
> the
> >> > schema of the index while keeping the data intact?
> >> >
> >> > Regards,
> >> > Salman
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Changing Solr Schema with Data

2015-12-28 Thread Salman Ansari
Hi,

I am facing an issue where I need to change Solr schema but I have crucial
data that I don't want to delete. Is there a way where I can change the
schema of the index while keeping the data intact?

Regards,
Salman


Re: Changing Solr Schema with Data

2015-12-28 Thread Salman Ansari
You can say that we are not removing any fields (so the old data should not
get affected), however, we need to add new fields (which new data will
have). Does that answer your question?


Regards,
Salman

On Mon, Dec 28, 2015 at 9:58 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Is the schema change affects the data you want to keep?
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 29 December 2015 at 01:48, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Hi,
> >
> > I am facing an issue where I need to change Solr schema but I have
> crucial
> > data that I don't want to delete. Is there a way where I can change the
> > schema of the index while keeping the data intact?
> >
> > Regards,
> > Salman
>


Multiple Unique Keys

2015-12-23 Thread Salman Ansari
Hi,

I am wondering if I can specify multiple unique keys in the same document
in Solr. My scenario is that I want to integrate with another system that
has an ID and our system has a reference number (auto-generated for each
document on the fly) as well that is unique.

What I am trying to achieve is to have uniqueness applied on both "ID" and
"Reference Number" so if I get a duplicate document from the source (which
will have the same ID) I want to override my existing document. What I am
not sure about is

1) Does Solr support multiple unique keys for a document?
2) What if the "ID" was the same but we generated a different "Reference
Number", will that override the existing document? (I mean one field among
the unique field is the same but the other is not)

Appreciate your feedback and comments.

Regards,
Salman


Re: Solr Auto-Complete

2015-12-08 Thread Salman Ansari
Thanks Alexandre. I think it is clear.

On Sun, Dec 6, 2015 at 5:21 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> For suffix matches, you copy text the field and in the different type add
> string reversal for both index and query portions. So you are doing prefix
> matching algorithm but on reversed strings.
>
> I can dig up an example if it is not clear.
> On 6 Dec 2015 8:06 am, "Salman Ansari" <salman.rah...@gmail.com> wrote:
>
> > That is right. I am actually looking for phrase prefixes not each term
> > prefix within the phrase. That satisfies my requirements. However, my
> > additional question was how do I manipulate the filedType to later allow
> > for suffix matches as well? or will that be a completely different
> > fieldType definition?
> >
> > Regards,
> > Salman
> >
> >
> > On Sun, Dec 6, 2015 at 2:12 PM, Andrea Gazzarini <a.gazzar...@gmail.com>
> > wrote:
> >
> > > Sorry, my damned mobile: "Is that close to what you were looking for?"
> > >
> > > 2015-12-06 12:07 GMT+01:00 Andrea Gazzarini <a.gazzar...@gmail.com>:
> > >
> > > > Do you mean "phrase" or "term" prefixes? If you try to put a field
> > value
> > > > (two or more terms) in the analysis page you will see what the index
> > > > analyzer chain (of my example field type) is doing. The whole value
> is
> > > > managed as a single-ngrammed token, so you will get only a phrase
> > prefix
> > > > search, as in your request.
> > > >
> > > > If you want to manage also terms prefixes, I would also index another
> > > > field (similar to the example you posted); then, the search handler
> > with
> > > > e(dismax) would have something like this:
> > > >
> > > >
> > > >>
> > > > text_suggestion_phrase_prefix_search^b1
> > > > text_suggestion_terms_prefix_search^b2
> > > >
> > > > 
> > > >
> > > >
> > > > b1 and b2 values strictly depend on your search logic.
> > > >
> > > > Is that close that what you were looking for?
> > > >
> > > > Best,
> > > > Andrea
> > > >
> > > >
> > > >
> > > > 2015-12-06 11:53 GMT+01:00 Salman Ansari <salman.rah...@gmail.com>:
> > > >
> > > >> Thanks a lot Andrea. It did work.
> > > >>
> > > >> However, just for my understanding, can you please explain more how
> > did
> > > >> you
> > > >> make it work for prefixes. I know you mentioned using another
> > Tokenizer
> > > >> but
> > > >> for example, if I want to tweak it later on to work on suffixes or
> > > within
> > > >> phrases how should I go about that?
> > > >>
> > > >> Thanks again for your help.
> > > >>
> > > >> Regards,
> > > >> Salman
> > > >>
> > > >>
> > > >> On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini <
> > a.gazzar...@gmail.com
> > > >
> > > >> wrote:
> > > >>
> > > >> > Hi Salman,
> > > >> > that's because you're using a StandardTokenizer. Try with
> something
> > > like
> > > >> > this (copied, pasted and changed using my phone so probably with a
> > lot
> > > >> of
> > > >> > mistakes ;) but you should be able to get what I mean). BTW I
> don't
> > > >> know if
> > > >> > that's the case but I would also put a MappingCharFilterFactory
> > > >> >
> > > >> >  > > >> > positionIncrementGap="100">
> > > >> > 
> > > >> > * > > >> > mapping="mapping-FoldToASCII.txt"/>*
> > > >> > 
> > > >> > 
> > > >> >  > > >> > generateWordParts="0" generateNumberParts="0" catenateAll="1"
> > > >> > splitOnCaseChange="0" />
> > > >> >  > > >> > maxGramSize="20"/>
> > > >> > 
> > > >> > 
> > > >> > * > > >&g

Issue with Querying Solr

2015-12-08 Thread Salman Ansari
Hi,

I have created a cluster of Solr and Zookeepers on 3 machines connected
together. Currently, I am facing a weird problem. My collection has only
261 documents and when I try to query the documents using the browser such
as

http://
[ASolrServerInTheCluster]:8983/solr/sabrLocationsStore/select?q=(*:*)

it returns the documents properly. However, when I try to do the same using
Solr.NET, it throws java.lang.OutOfMemoryError: Java heap space exception
(although I have very few documents there). Any ideas why I am getting this
error?

Regards,
Salman


Re: Issue with Querying Solr

2015-12-08 Thread Salman Ansari
Thanks Andrea and Alexandre for your responses. Indeed it was the problem
that Solr.NET was returning many rows (as I captured this by fiddler).
Currently, my setup has only 500MB of JVM (which I will definitely
increase) but at least I found the culprit by reducing the number of rows
returned.

Regards,
Salman

On Tue, Dec 8, 2015 at 5:30 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Solr by default only returns 10 rows. SolrNet by default returns many
> rows. I don't know why that would cause OOM, but that's definitely
> your difference unless you dealt with it:
>
> https://github.com/mausch/SolrNet/blob/master/Documentation/Querying.md#pagination
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 8 December 2015 at 07:52, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Hi,
> >
> > I have created a cluster of Solr and Zookeepers on 3 machines connected
> > together. Currently, I am facing a weird problem. My collection has only
> > 261 documents and when I try to query the documents using the browser
> such
> > as
> >
> > http://
> > [ASolrServerInTheCluster]:8983/solr/sabrLocationsStore/select?q=(*:*)
> >
> > it returns the documents properly. However, when I try to do the same
> using
> > Solr.NET, it throws java.lang.OutOfMemoryError: Java heap space exception
> > (although I have very few documents there). Any ideas why I am getting
> this
> > error?
> >
> > Regards,
> > Salman
>


Re: Solr Auto-Complete

2015-12-06 Thread Salman Ansari
Hi,



I have updated my schema.xml as mentioned in the previous posts using

















This does the auto-complete, but it does it at every portion of the text
(not just at the beginning) (prefix). So searching for "And" in my field
for locations returns both of the following documents.





1

AD

*And*orra

أندورا

1519794717684924416





5

AG

Antigua *and* Barbuda

أنتيجوا وبربودا

1519794717701701633





I have read about this and at first I thought I need to add side="front"
but after adding that, Solr returned an error (when creating a collection)
indicating "Unknown parameters 

Re: Solr Auto-Complete

2015-12-06 Thread Salman Ansari
Thanks a lot Andrea. It did work.

However, just for my understanding, can you please explain more how did you
make it work for prefixes. I know you mentioned using another Tokenizer but
for example, if I want to tweak it later on to work on suffixes or within
phrases how should I go about that?

Thanks again for your help.

Regards,
Salman


On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini <a.gazzar...@gmail.com>
wrote:

> Hi Salman,
> that's because you're using a StandardTokenizer. Try with something like
> this (copied, pasted and changed using my phone so probably with a lot of
> mistakes ;) but you should be able to get what I mean). BTW I don't know if
> that's the case but I would also put a MappingCharFilterFactory
>
>  positionIncrementGap="100">
> 
> * mapping="mapping-FoldToASCII.txt"/>*
> 
> 
>  generateWordParts="0" generateNumberParts="0" catenateAll="1"
> splitOnCaseChange="0" />
>  maxGramSize="20"/>
> 
> 
> * mapping="mapping-FoldToASCII.txt"/>*
> 
> 
>      generateWordParts="0" generateNumberParts="0" catenateAll="1"
> splitOnCaseChange="0" />
> 
> 
>
>
> 2015-12-06 9:36 GMT+01:00 Salman Ansari <salman.rah...@gmail.com>:
>
> > Hi,
> >
> >
> >
> > I have updated my schema.xml as mentioned in the previous posts using
> >
> >
> >
> >  > positionIncrementGap="100">
> > 
> > 
> > 
> >  > maxGramSize="20"/>
> > 
> > 
> > 
> > 
> > 
> > 
> >
> >
> >
> > This does the auto-complete, but it does it at every portion of the text
> > (not just at the beginning) (prefix). So searching for "And" in my field
> > for locations returns both of the following documents.
> >
> >
> >
> > 
> >
> > 1
> >
> > AD
> >
> > *And*orra
> >
> > أندورا
> >
> > 1519794717684924416
> >
> > 
> >
> > 
> >
> > 5
> >
> > AG
> >
> > Antigua *and* Barbuda
> >
> > أنتيجوا وبربودا
> >
> > 1519794717701701633
> >
> > 
> >
> >
> >
> > I have read about this and at first I thought I need to add side="front"
> > but after adding that, Solr returned an error (when creating a
> collection)
> > indicating "Unknown parameters 

Re: Solr Auto-Complete

2015-12-06 Thread Salman Ansari
That is right. I am actually looking for phrase prefixes not each term
prefix within the phrase. That satisfies my requirements. However, my
additional question was how do I manipulate the filedType to later allow
for suffix matches as well? or will that be a completely different
fieldType definition?

Regards,
Salman


On Sun, Dec 6, 2015 at 2:12 PM, Andrea Gazzarini <a.gazzar...@gmail.com>
wrote:

> Sorry, my damned mobile: "Is that close to what you were looking for?"
>
> 2015-12-06 12:07 GMT+01:00 Andrea Gazzarini <a.gazzar...@gmail.com>:
>
> > Do you mean "phrase" or "term" prefixes? If you try to put a field value
> > (two or more terms) in the analysis page you will see what the index
> > analyzer chain (of my example field type) is doing. The whole value is
> > managed as a single-ngrammed token, so you will get only a phrase prefix
> > search, as in your request.
> >
> > If you want to manage also terms prefixes, I would also index another
> > field (similar to the example you posted); then, the search handler with
> > e(dismax) would have something like this:
> >
> >
> >>
> > text_suggestion_phrase_prefix_search^b1
> > text_suggestion_terms_prefix_search^b2
> >
> > 
> >
> >
> > b1 and b2 values strictly depend on your search logic.
> >
> > Is that close that what you were looking for?
> >
> > Best,
> > Andrea
> >
> >
> >
> > 2015-12-06 11:53 GMT+01:00 Salman Ansari <salman.rah...@gmail.com>:
> >
> >> Thanks a lot Andrea. It did work.
> >>
> >> However, just for my understanding, can you please explain more how did
> >> you
> >> make it work for prefixes. I know you mentioned using another Tokenizer
> >> but
> >> for example, if I want to tweak it later on to work on suffixes or
> within
> >> phrases how should I go about that?
> >>
> >> Thanks again for your help.
> >>
> >> Regards,
> >> Salman
> >>
> >>
> >> On Sun, Dec 6, 2015 at 1:24 PM, Andrea Gazzarini <a.gazzar...@gmail.com
> >
> >> wrote:
> >>
> >> > Hi Salman,
> >> > that's because you're using a StandardTokenizer. Try with something
> like
> >> > this (copied, pasted and changed using my phone so probably with a lot
> >> of
> >> > mistakes ;) but you should be able to get what I mean). BTW I don't
> >> know if
> >> > that's the case but I would also put a MappingCharFilterFactory
> >> >
> >> >  >> > positionIncrementGap="100">
> >> > 
> >> > * >> > mapping="mapping-FoldToASCII.txt"/>    *
> >> > 
> >> > 
> >> >  >> > generateWordParts="0" generateNumberParts="0" catenateAll="1"
> >> > splitOnCaseChange="0" />
> >> >  >> > maxGramSize="20"/>
> >> > 
> >> > 
> >> > * >> > mapping="mapping-FoldToASCII.txt"/>*
> >> > 
> >> > 
> >> >  >> > generateWordParts="0" generateNumberParts="0" catenateAll="1"
> >> > splitOnCaseChange="0" />
> >> > 
> >> > 
> >> >
> >> >
> >> > 2015-12-06 9:36 GMT+01:00 Salman Ansari <salman.rah...@gmail.com>:
> >> >
> >> > > Hi,
> >> > >
> >> > >
> >> > >
> >> > > I have updated my schema.xml as mentioned in the previous posts
> using
> >> > >
> >> > >
> >> > >
> >> > >  >> > > positionIncrementGap="100">
> >> > > 
> >> > > 
> >> > > 
> >> > >  >> minGramSize="1"
> >> > > maxGramSize="20"/>
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > > 
> >> > >
> >> > >
> >> > >
> >> > > This does the auto-complete, but it does it at every portion of the
> >> text
> >> > > (not just at the beginning) (prefix). So searching for "And" in my
> >> field
> >> > > for locations returns both of the following documents.
> >> > >
> >> > >
> >> > >
> >> > > 
> >> > >
> >> > > 1
> >> > >
> >> > > AD
> >> > >
> >> > > *And*orra
> >> > >
> >> > > أندورا
> >> > >
> >> > > 1519794717684924416
> >> > >
> >> > > 
> >> > >
> >> > > 
> >> > >
> >> > > 5
> >> > >
> >> > > AG
> >> > >
> >> > > Antigua *and* Barbuda
> >> > >
> >> > > أنتيجوا وبربودا
> >> > >
> >> > > 1519794717701701633
> >> > >
> >> > > 
> >> > >
> >> > >
> >> > >
> >> > > I have read about this and at first I thought I need to add
> >> side="front"
> >> > > but after adding that, Solr returned an error (when creating a
> >> > collection)
> >> > > indicating "Unknown parameters 

Re: Solr Auto-Complete

2015-12-04 Thread Salman Ansari
Thanks Alan, Alessandaro and Andrea for your great explanations. I will
follow the path of adding edge ngrams to the field type for my use case.

Regards,
Salman

On Thu, Dec 3, 2015 at 12:23 PM, Alessandro Benedetti <abenede...@apache.org
> wrote:

> "Sounds good but I heard "/suggest" component is the recommended way of
> doing auto-complete"
>
> This sounds fantastic :)
> We "heard" that as well, we know what the suggest component does.
> The point is that you would like to retrieve the suggestions + some
> consistent payload in different fields.
> Current suggest component offers some effort in providing a payload, but
> almost all the suggester implementation are based on an FST approach which
> aim to be as fast and memory efficient as possible.
> Honestly you could experiment and even contribute a customisation if you
> want to add a new feature to the suggest component able to return complex
> payloads together with the suggestions.
> Apart that, it strictly depends of how you want to provide the
> autocompletion, there are plenty of different lookups implementation and
> plenty of tokenizer/token filters to combine .
> So I would confirm what we already said and that Andrea confirmed.
>
> If anyone has played with the suggester suggestions payload, his feedback
> is welcome!
>
> Cheers
>
>
> On 3 December 2015 at 06:21, Andrea Gazzarini <a.gazzar...@gmail.com>
> wrote:
>
> > Hi Salman,
> > few months ago I have been involved in a project similar to
> > map.geoadmin.ch
> > and there, I had your same need (I also sent an email to this list).
> >
> > From my side I can furtherly confirm what Alan and Alessandro already
> > explained, I followed that approach.
> >
> > IMHO, that is the "recommended way" if the component's features meet your
> > needs (i.e. do not reinvent the wheel) but it seems you're out of those
> > bounds.
> >
> > Best,
> > Andrea
> > On 2 Dec 2015 21:51, "Salman Ansari" <salman.rah...@gmail.com> wrote:
> >
> > > Sounds good but I heard "/suggest" component is the recommended way of
> > > doing auto-complete in the new versions of Solr. Something along the
> > lines
> > > of this article
> > > https://cwiki.apache.org/confluence/display/solr/Suggester
> > >
> > > 
> > >   
> > > mySuggester
> > > FuzzyLookupFactory
> > > DocumentDictionaryFactory
> > > cat
> > > price
> > > string
> > > false
> > >   
> > > 
> > >
> > > Can someone confirm this?
> > >
> > > Regards,
> > > Salman
> > >
> > >
> > > On Wed, Dec 2, 2015 at 1:14 PM, Alessandro Benedetti <
> > > abenede...@apache.org>
> > > wrote:
> > >
> > > > Hi Salman,
> > > > I agree with Alan.
> > > > Just configure your schema with the proper analysers .
> > > > For the field you want to use for suggestions you are likely to need
> > > simply
> > > > this fieldType :
> > > >
> > > >  > > > positionIncrementGap="100">
> > > > 
> > > > 
> > > >     
> > > >  minGramSize="1"
> > > > maxGramSize="20"/>
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > > 
> > > >
> > > > This is a very sample example, please adapt it to your use case.
> > > >
> > > > Cheers
> > > >
> > > > On 2 December 2015 at 09:41, Alan Woodward <a...@flax.co.uk> wrote:
> > > >
> > > > > Hi Salman,
> > > > >
> > > > > It sounds as though you want to do a normal search against a
> special
> > > > > 'suggest' field, that's been indexed with edge ngrams.
> > > > >
> > > > > Alan Woodward
> > > > > www.flax.co.uk
> > > > >
> > > > >
> > > > > On 2 Dec 2015, at 09:31, Salman Ansari wrote:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am looking for auto-complete in Solr but on top of just auto
> > > > complete I
> > > > > > want as well to re

Re: Solr Auto-Complete

2015-12-02 Thread Salman Ansari
Sounds good but I heard "/suggest" component is the recommended way of
doing auto-complete in the new versions of Solr. Something along the lines
of this article
https://cwiki.apache.org/confluence/display/solr/Suggester


  
mySuggester
FuzzyLookupFactory
DocumentDictionaryFactory
cat
price
string
false
  


Can someone confirm this?

Regards,
Salman


On Wed, Dec 2, 2015 at 1:14 PM, Alessandro Benedetti <abenede...@apache.org>
wrote:

> Hi Salman,
> I agree with Alan.
> Just configure your schema with the proper analysers .
> For the field you want to use for suggestions you are likely to need simply
> this fieldType :
>
>  positionIncrementGap="100">
> 
> 
> 
>  maxGramSize="20"/>
> 
> 
> 
> 
> 
> 
>
> This is a very sample example, please adapt it to your use case.
>
> Cheers
>
> On 2 December 2015 at 09:41, Alan Woodward <a...@flax.co.uk> wrote:
>
> > Hi Salman,
> >
> > It sounds as though you want to do a normal search against a special
> > 'suggest' field, that's been indexed with edge ngrams.
> >
> > Alan Woodward
> > www.flax.co.uk
> >
> >
> > On 2 Dec 2015, at 09:31, Salman Ansari wrote:
> >
> > > Hi,
> > >
> > > I am looking for auto-complete in Solr but on top of just auto
> complete I
> > > want as well to return the data completely (not just suggestions), so I
> > > want to get back the ids, and other fields in the whole document. I
> tried
> > > the following 2 approaches but each had issues
> > >
> > > 1) Used the /suggest component but that returns a very specific format
> > > which looks like I cannot customize. I want to return the whole
> document
> > > that has a matching field and not only the suggestion list. So for
> > example,
> > > if I write "hard" it returns the results in a specific format as
> follows
> > >
> > >   hard drive
> > > hard disk
> > >
> > > Is there a way to get back additional fields with suggestions?
> > >
> > > 2) Tried the normal /select component but that does not do
> auto-complete
> > on
> > > portion of the word. So, for example, if I write the query as "bara" it
> > > DOES NOT return "barack obama". Any suggestions how to solve this?
> > >
> > >
> > > Regards,
> > > Salman
> >
> >
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Solr Auto-Complete

2015-12-02 Thread Salman Ansari
Hi,

I am looking for auto-complete in Solr but on top of just auto complete I
want as well to return the data completely (not just suggestions), so I
want to get back the ids, and other fields in the whole document. I tried
the following 2 approaches but each had issues

1) Used the /suggest component but that returns a very specific format
which looks like I cannot customize. I want to return the whole document
that has a matching field and not only the suggestion list. So for example,
if I write "hard" it returns the results in a specific format as follows

  hard drive
hard disk

 Is there a way to get back additional fields with suggestions?

2) Tried the normal /select component but that does not do auto-complete on
portion of the word. So, for example, if I write the query as "bara" it
DOES NOT return "barack obama". Any suggestions how to solve this?


Regards,
Salman


Re: Setting up Solr on multiple machines

2015-11-29 Thread Salman Ansari
Correct me if I am wrong but my understanding is that even connecting to
one zookeeper should be enough as internally that zookeeper will sync Solr
server info to other zookeepers in the ensemble (as long as that zookeeper
belongs to an ensemble). Having said that, if that particular zookeeper
goes down, another one from the ensemble should be able to serve the Solr
instance.

What made me even more leaning towards this understanding is that I tried
connecting 2 different solr instances to 2 different zookeepers (but both
belong to the same ensemble) and I realized both Solr servers can see each
other. I guess that does explain somehow that zookeepers are sharing solr
servers information among the ensemble.

Regards,
Salman

On Mon, Nov 30, 2015 at 1:07 AM, Walter Underwood <wun...@wunderwood.org>
wrote:

> Why would that link answer the question?
>
> Each Solr connects to one Zookeeper node. If that node goes down,
> Zookeeper is still available, but the node will need to connect to a new
> node.
>
> Specifying only one zk node is a single point of failure. If that node
> goes down, Solr cannot continue operating.
>
> Specifying a list of all the zk nodes is robust. If one goes down, it
> tries another.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Nov 29, 2015, at 12:19 PM, Don Bosco Durai <bo...@apache.org> wrote:
> >
> > This should answer your question:
> https://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html#sc_designGoals
> >
> > On 11/29/15, 12:04 PM, "Salman Ansari" <salman.rah...@gmail.com> wrote:
> >
> >> my point is that what is the exact difference between the whole list and
> >> one zookeeper? Moreover, I think this issue is related to Windows
> command
> >> as mentioned here
> >>
> http://stackoverflow.com/questions/28837827/solr-5-0-unable-to-start-solr-with-zookeeper-ensemble
> >>
> >>
> >> On Sun, Nov 29, 2015 at 10:55 PM, Don Bosco Durai <bo...@apache.org>
> wrote:
> >>
> >>> It is highly recommended to list all, but for testing, you might be
> able
> >>> to get away giving only one.
> >>>
> >>> If the list doesn’t work, then you might even want to look into
> zookeeper
> >>> and see whether they are setup properly.
> >>>
> >>> Bosco
> >>>
> >>> On 11/29/15, 11:51 AM, "Salman Ansari" <salman.rah...@gmail.com>
> wrote:
> >>>
> >>>> but the point is: do I really need to list all the zookeepers in the
> >>>> ensemble when starting solr or I can just specify one of them?
> >>>>
> >>>> On Sun, Nov 29, 2015 at 10:45 PM, Don Bosco Durai <bo...@apache.org>
> >>> wrote:
> >>>>
> >>>>> You might want to check the logs for why solr is not starting up.
> >>>>>
> >>>>>
> >>>>> Bosco
> >>>>>
> >>>>>
> >>>>> On 11/29/15, 11:30 AM, "Salman Ansari" <salman.rah...@gmail.com>
> wrote:
> >>>>>
> >>>>>> Thanks for your reply.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Actually I am following the official guide to start solr using (on
> >>> Windows
> >>>>>> machines)
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> bin/solr start -e cloud -z zk1:2181,zk2:2182,zk3:2183
> >>>>>>
> >>>>>> (it is listed here
> >>>>>>
> >>>>>
> >>>
> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
> >>>>>> )
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> However, I am facing 2 issues
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 1) If I specify the full list of ensemble (even with quotes around
> -z
> >>>>>> "zk1:2181,zk2:2182,zk3:2183") it does not start Solr on port 8983
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2) Then I tried the workaround, which is specifying "localhost" on
> each
> >>>>>> Solr server to consult its local Zookeeper instance that is par

Re: Setting up Solr on multiple machines

2015-11-29 Thread Salman Ansari
Thanks for your reply.



Actually I am following the official guide to start solr using (on Windows
machines)



bin/solr start -e cloud -z zk1:2181,zk2:2182,zk3:2183

(it is listed here
https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
)



However, I am facing 2 issues



1) If I specify the full list of ensemble (even with quotes around -z
"zk1:2181,zk2:2182,zk3:2183") it does not start Solr on port 8983





2) Then I tried the workaround, which is specifying "localhost" on each
Solr server to consult its local Zookeeper instance that is part of the
ensemble, which worked as follows



bin/solr start -e cloud -z localhost:2181(on each machine that has
zookeeper as well)



I followed the wizard (on each machine) to create 2 shards on 2 ports and 2
replicas. For the first machine I created "test" collection, but for the
second one I just reused the same collection. Now, Solr works on both
machines but the issue is that when I see Solr admin page, it shows all the
shards and replicas of the collection on ONE MACHINE.


Any ideas why I am facing these issues?


Regards,

Salman

On Sun, Nov 29, 2015 at 10:07 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> 1> I'll pass
>
> 2a> yes.
> 2b> This should be automatic when you create the collection. You
> should specify numShards=2, replicationFactor=2 and
> maxShardsPerNode=2. Solr tries hard to distribute the shards and
> replicas on different machines.
>
> If you _really_ require exact placement, you can specify createNodeSet
> which will assign shards round-robin to the specified list or even
> EMPTY which will create no actual cores at all. In this latter case
> you could then use ADDREPLICA to place each shard and replica exactly
> where you want it to go.
>
> But I wouldn't bother first, just do what I outlined in 2b and it
> should be fine.
>
> Best,
> Erick
>
> On Sat, Nov 28, 2015 at 1:03 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > I have started with one Zookeeper to test things and I have the following
> > questions
> >
> > 1) In my zoo.cfg I have defined
> > tickTime=4000
> > dataDir=C:\\Solr\\Zookeeper\\zookeeper-3.4.6\\data
> > clientPort=2183
> >
> > the strange thing is that it picks up dataDir and clientPort but always
> > keeps tickTime = 3000. Any idea why?
> >
> > 2) It is clear from the documentation how to create an ensemble of
> > Zookeepers on 3 machines but what I am not sure about is how to
> >   a)  Setup actual Solr on 2 machines (is it just installing Solr on
> > each server and then passing the same zookeeper ensemble)?
> >   b) How to (using Solr Cloud) create 2 shards spread on 2 machines
> > with each machine having a replica of the other for high availability. So
> > server1 will have shard1 and replica2 and server2 will have shard2 and
> > replica1?
> >
> > Comments and feedback are appreciated.
> >
> > Regards,
> > Salman
> >
> >
> > On Fri, Nov 27, 2015 at 5:52 AM, Zheng Lin Edwin Yeo <
> edwinye...@gmail.com>
> > wrote:
> >
> >> Yes, the ZooKeeper is Windows compatible.
> >>
> >> You can follow the guide, just need to replace the Linux commands with
> the
> >> Windows commands and paths
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On 26 November 2015 at 20:56, Alessandro Benedetti <
> abenede...@apache.org>
> >> wrote:
> >>
> >> > I think it should be straightforward following the Solr wiki :
> >> >
> >> >
> >> >
> >>
> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
> >> >
> >> > I never played much in details with zookeeper ( never tried on a
> windows
> >> > machine), but I assume it is windows compatible ( I can see binaries
> >> .cmd )
> >> >
> >> > Cheers
> >> >
> >> > On 26 November 2015 at 12:38, Salman Ansari <salman.rah...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I have seen the guide of setting up Solr on one machine as well as
> >> > setting
> >> > > it up on multiple machines on Liunx. Is there a good guide of how to
> >> > setup
> >> > > Solr on multiple machines on Windows Server with Zookeeper
> ensemble? My
> >> > > structure is as follows
> >> > >
> >> > > 1) 3 machines will have Zookeeper to create an ensemble
> >> > > 2) 2 of these machines will have Solr installed (with each having a
> >> > replica
> >> > > of other to provide high availability)
> >> > >
> >> > > Any link/article that provides such a guide?
> >> > >
> >> > > Regards,
> >> > > Salman
> >> > >
> >> >
> >> >
> >> >
> >> > --
> >> > --
> >> >
> >> > Benedetti Alessandro
> >> > Visiting card : http://about.me/alessandro_benedetti
> >> >
> >> > "Tyger, tyger burning bright
> >> > In the forests of the night,
> >> > What immortal hand or eye
> >> > Could frame thy fearful symmetry?"
> >> >
> >> > William Blake - Songs of Experience -1794 England
> >> >
> >>
>


Re: Setting up Solr on multiple machines

2015-11-29 Thread Salman Ansari
Also I am interested in knowing how to create a collection where the
replica and the same shard do not reside on the same machine. So,
basically, shard1 with replica2 in one machine and shard2 with replica1 on
the other machine. Is that by default when creating a collection of 2
shards and 2 replicas?
On Nov 30, 2015 1:36 AM, "Salman Ansari" <salman.rah...@gmail.com> wrote:

> Correct me if I am wrong but my understanding is that even connecting to
> one zookeeper should be enough as internally that zookeeper will sync Solr
> server info to other zookeepers in the ensemble (as long as that zookeeper
> belongs to an ensemble). Having said that, if that particular zookeeper
> goes down, another one from the ensemble should be able to serve the Solr
> instance.
>
> What made me even more leaning towards this understanding is that I tried
> connecting 2 different solr instances to 2 different zookeepers (but both
> belong to the same ensemble) and I realized both Solr servers can see each
> other. I guess that does explain somehow that zookeepers are sharing solr
> servers information among the ensemble.
>
> Regards,
> Salman
>
> On Mon, Nov 30, 2015 at 1:07 AM, Walter Underwood <wun...@wunderwood.org>
> wrote:
>
>> Why would that link answer the question?
>>
>> Each Solr connects to one Zookeeper node. If that node goes down,
>> Zookeeper is still available, but the node will need to connect to a new
>> node.
>>
>> Specifying only one zk node is a single point of failure. If that node
>> goes down, Solr cannot continue operating.
>>
>> Specifying a list of all the zk nodes is robust. If one goes down, it
>> tries another.
>>
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>>
>> > On Nov 29, 2015, at 12:19 PM, Don Bosco Durai <bo...@apache.org> wrote:
>> >
>> > This should answer your question:
>> https://zookeeper.apache.org/doc/r3.2.2/zookeeperOver.html#sc_designGoals
>> >
>> > On 11/29/15, 12:04 PM, "Salman Ansari" <salman.rah...@gmail.com> wrote:
>> >
>> >> my point is that what is the exact difference between the whole list
>> and
>> >> one zookeeper? Moreover, I think this issue is related to Windows
>> command
>> >> as mentioned here
>> >>
>> http://stackoverflow.com/questions/28837827/solr-5-0-unable-to-start-solr-with-zookeeper-ensemble
>> >>
>> >>
>> >> On Sun, Nov 29, 2015 at 10:55 PM, Don Bosco Durai <bo...@apache.org>
>> wrote:
>> >>
>> >>> It is highly recommended to list all, but for testing, you might be
>> able
>> >>> to get away giving only one.
>> >>>
>> >>> If the list doesn’t work, then you might even want to look into
>> zookeeper
>> >>> and see whether they are setup properly.
>> >>>
>> >>> Bosco
>> >>>
>> >>> On 11/29/15, 11:51 AM, "Salman Ansari" <salman.rah...@gmail.com>
>> wrote:
>> >>>
>> >>>> but the point is: do I really need to list all the zookeepers in the
>> >>>> ensemble when starting solr or I can just specify one of them?
>> >>>>
>> >>>> On Sun, Nov 29, 2015 at 10:45 PM, Don Bosco Durai <bo...@apache.org>
>> >>> wrote:
>> >>>>
>> >>>>> You might want to check the logs for why solr is not starting up.
>> >>>>>
>> >>>>>
>> >>>>> Bosco
>> >>>>>
>> >>>>>
>> >>>>> On 11/29/15, 11:30 AM, "Salman Ansari" <salman.rah...@gmail.com>
>> wrote:
>> >>>>>
>> >>>>>> Thanks for your reply.
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> Actually I am following the official guide to start solr using (on
>> >>> Windows
>> >>>>>> machines)
>> >>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>> bin/solr start -e cloud -z zk1:2181,zk2:2182,zk3:2183
>> >>>>>>
>> >>>>>> (it is listed here
>> >>>>>>
>> >>>>>
>> >>>
>> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
>> >>>>>> )

Re: Setting up Solr on multiple machines

2015-11-29 Thread Salman Ansari
but the point is: do I really need to list all the zookeepers in the
ensemble when starting solr or I can just specify one of them?

On Sun, Nov 29, 2015 at 10:45 PM, Don Bosco Durai <bo...@apache.org> wrote:

> You might want to check the logs for why solr is not starting up.
>
>
> Bosco
>
>
> On 11/29/15, 11:30 AM, "Salman Ansari" <salman.rah...@gmail.com> wrote:
>
> >Thanks for your reply.
> >
> >
> >
> >Actually I am following the official guide to start solr using (on Windows
> >machines)
> >
> >
> >
> >bin/solr start -e cloud -z zk1:2181,zk2:2182,zk3:2183
> >
> >(it is listed here
> >
> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
> >)
> >
> >
> >
> >However, I am facing 2 issues
> >
> >
> >
> >1) If I specify the full list of ensemble (even with quotes around -z
> >"zk1:2181,zk2:2182,zk3:2183") it does not start Solr on port 8983
> >
> >
> >
> >
> >
> >2) Then I tried the workaround, which is specifying "localhost" on each
> >Solr server to consult its local Zookeeper instance that is part of the
> >ensemble, which worked as follows
> >
> >
> >
> >bin/solr start -e cloud -z localhost:2181(on each machine that has
> >zookeeper as well)
> >
> >
> >
> >I followed the wizard (on each machine) to create 2 shards on 2 ports and
> 2
> >replicas. For the first machine I created "test" collection, but for the
> >second one I just reused the same collection. Now, Solr works on both
> >machines but the issue is that when I see Solr admin page, it shows all
> the
> >shards and replicas of the collection on ONE MACHINE.
> >
> >
> >Any ideas why I am facing these issues?
> >
> >
> >Regards,
> >
> >Salman
> >
> >On Sun, Nov 29, 2015 at 10:07 PM, Erick Erickson <erickerick...@gmail.com
> >
> >wrote:
> >
> >> 1> I'll pass
> >>
> >> 2a> yes.
> >> 2b> This should be automatic when you create the collection. You
> >> should specify numShards=2, replicationFactor=2 and
> >> maxShardsPerNode=2. Solr tries hard to distribute the shards and
> >> replicas on different machines.
> >>
> >> If you _really_ require exact placement, you can specify createNodeSet
> >> which will assign shards round-robin to the specified list or even
> >> EMPTY which will create no actual cores at all. In this latter case
> >> you could then use ADDREPLICA to place each shard and replica exactly
> >> where you want it to go.
> >>
> >> But I wouldn't bother first, just do what I outlined in 2b and it
> >> should be fine.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sat, Nov 28, 2015 at 1:03 PM, Salman Ansari <salman.rah...@gmail.com
> >
> >> wrote:
> >> > I have started with one Zookeeper to test things and I have the
> following
> >> > questions
> >> >
> >> > 1) In my zoo.cfg I have defined
> >> > tickTime=4000
> >> > dataDir=C:\\Solr\\Zookeeper\\zookeeper-3.4.6\\data
> >> > clientPort=2183
> >> >
> >> > the strange thing is that it picks up dataDir and clientPort but
> always
> >> > keeps tickTime = 3000. Any idea why?
> >> >
> >> > 2) It is clear from the documentation how to create an ensemble of
> >> > Zookeepers on 3 machines but what I am not sure about is how to
> >> >   a)  Setup actual Solr on 2 machines (is it just installing Solr
> on
> >> > each server and then passing the same zookeeper ensemble)?
> >> >   b) How to (using Solr Cloud) create 2 shards spread on 2
> machines
> >> > with each machine having a replica of the other for high
> availability. So
> >> > server1 will have shard1 and replica2 and server2 will have shard2 and
> >> > replica1?
> >> >
> >> > Comments and feedback are appreciated.
> >> >
> >> > Regards,
> >> > Salman
> >> >
> >> >
> >> > On Fri, Nov 27, 2015 at 5:52 AM, Zheng Lin Edwin Yeo <
> >> edwinye...@gmail.com>
> >> > wrote:
> >> >
> >> >> Yes, the ZooKeeper is Windows compatible.
> >> >>
> >> >> You can follow the guide, just need to replace the Linux commands
> with
> >> the
> &g

Re: Setting up Solr on multiple machines

2015-11-29 Thread Salman Ansari
my point is that what is the exact difference between the whole list and
one zookeeper? Moreover, I think this issue is related to Windows command
as mentioned here
http://stackoverflow.com/questions/28837827/solr-5-0-unable-to-start-solr-with-zookeeper-ensemble


On Sun, Nov 29, 2015 at 10:55 PM, Don Bosco Durai <bo...@apache.org> wrote:

> It is highly recommended to list all, but for testing, you might be able
> to get away giving only one.
>
> If the list doesn’t work, then you might even want to look into zookeeper
> and see whether they are setup properly.
>
> Bosco
>
>
>
>
>
> On 11/29/15, 11:51 AM, "Salman Ansari" <salman.rah...@gmail.com> wrote:
>
> >but the point is: do I really need to list all the zookeepers in the
> >ensemble when starting solr or I can just specify one of them?
> >
> >On Sun, Nov 29, 2015 at 10:45 PM, Don Bosco Durai <bo...@apache.org>
> wrote:
> >
> >> You might want to check the logs for why solr is not starting up.
> >>
> >>
> >> Bosco
> >>
> >>
> >> On 11/29/15, 11:30 AM, "Salman Ansari" <salman.rah...@gmail.com> wrote:
> >>
> >> >Thanks for your reply.
> >> >
> >> >
> >> >
> >> >Actually I am following the official guide to start solr using (on
> Windows
> >> >machines)
> >> >
> >> >
> >> >
> >> >bin/solr start -e cloud -z zk1:2181,zk2:2182,zk3:2183
> >> >
> >> >(it is listed here
> >> >
> >>
> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
> >> >)
> >> >
> >> >
> >> >
> >> >However, I am facing 2 issues
> >> >
> >> >
> >> >
> >> >1) If I specify the full list of ensemble (even with quotes around -z
> >> >"zk1:2181,zk2:2182,zk3:2183") it does not start Solr on port 8983
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >2) Then I tried the workaround, which is specifying "localhost" on each
> >> >Solr server to consult its local Zookeeper instance that is part of the
> >> >ensemble, which worked as follows
> >> >
> >> >
> >> >
> >> >bin/solr start -e cloud -z localhost:2181(on each machine that has
> >> >zookeeper as well)
> >> >
> >> >
> >> >
> >> >I followed the wizard (on each machine) to create 2 shards on 2 ports
> and
> >> 2
> >> >replicas. For the first machine I created "test" collection, but for
> the
> >> >second one I just reused the same collection. Now, Solr works on both
> >> >machines but the issue is that when I see Solr admin page, it shows all
> >> the
> >> >shards and replicas of the collection on ONE MACHINE.
> >> >
> >> >
> >> >Any ideas why I am facing these issues?
> >> >
> >> >
> >> >Regards,
> >> >
> >> >Salman
> >> >
> >> >On Sun, Nov 29, 2015 at 10:07 PM, Erick Erickson <
> erickerick...@gmail.com
> >> >
> >> >wrote:
> >> >
> >> >> 1> I'll pass
> >> >>
> >> >> 2a> yes.
> >> >> 2b> This should be automatic when you create the collection. You
> >> >> should specify numShards=2, replicationFactor=2 and
> >> >> maxShardsPerNode=2. Solr tries hard to distribute the shards and
> >> >> replicas on different machines.
> >> >>
> >> >> If you _really_ require exact placement, you can specify
> createNodeSet
> >> >> which will assign shards round-robin to the specified list or even
> >> >> EMPTY which will create no actual cores at all. In this latter case
> >> >> you could then use ADDREPLICA to place each shard and replica exactly
> >> >> where you want it to go.
> >> >>
> >> >> But I wouldn't bother first, just do what I outlined in 2b and it
> >> >> should be fine.
> >> >>
> >> >> Best,
> >> >> Erick
> >> >>
> >> >> On Sat, Nov 28, 2015 at 1:03 PM, Salman Ansari <
> salman.rah...@gmail.com
> >> >
> >> >> wrote:
> >> >> > I have started with one Zookeeper to test things and I have the

Re: Setting up Solr on multiple machines

2015-11-28 Thread Salman Ansari
I have started with one Zookeeper to test things and I have the following
questions

1) In my zoo.cfg I have defined
tickTime=4000
dataDir=C:\\Solr\\Zookeeper\\zookeeper-3.4.6\\data
clientPort=2183

the strange thing is that it picks up dataDir and clientPort but always
keeps tickTime = 3000. Any idea why?

2) It is clear from the documentation how to create an ensemble of
Zookeepers on 3 machines but what I am not sure about is how to
  a)  Setup actual Solr on 2 machines (is it just installing Solr on
each server and then passing the same zookeeper ensemble)?
  b) How to (using Solr Cloud) create 2 shards spread on 2 machines
with each machine having a replica of the other for high availability. So
server1 will have shard1 and replica2 and server2 will have shard2 and
replica1?

Comments and feedback are appreciated.

Regards,
Salman


On Fri, Nov 27, 2015 at 5:52 AM, Zheng Lin Edwin Yeo <edwinye...@gmail.com>
wrote:

> Yes, the ZooKeeper is Windows compatible.
>
> You can follow the guide, just need to replace the Linux commands with the
> Windows commands and paths
>
> Regards,
> Edwin
>
>
> On 26 November 2015 at 20:56, Alessandro Benedetti <abenede...@apache.org>
> wrote:
>
> > I think it should be straightforward following the Solr wiki :
> >
> >
> >
> https://cwiki.apache.org/confluence/display/solr/Setting+Up+an+External+ZooKeeper+Ensemble
> >
> > I never played much in details with zookeeper ( never tried on a windows
> > machine), but I assume it is windows compatible ( I can see binaries
> .cmd )
> >
> > Cheers
> >
> > On 26 November 2015 at 12:38, Salman Ansari <salman.rah...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have seen the guide of setting up Solr on one machine as well as
> > setting
> > > it up on multiple machines on Liunx. Is there a good guide of how to
> > setup
> > > Solr on multiple machines on Windows Server with Zookeeper ensemble? My
> > > structure is as follows
> > >
> > > 1) 3 machines will have Zookeeper to create an ensemble
> > > 2) 2 of these machines will have Solr installed (with each having a
> > replica
> > > of other to provide high availability)
> > >
> > > Any link/article that provides such a guide?
> > >
> > > Regards,
> > > Salman
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>


Setting up Solr on multiple machines

2015-11-26 Thread Salman Ansari
Hi,

I have seen the guide of setting up Solr on one machine as well as setting
it up on multiple machines on Liunx. Is there a good guide of how to setup
Solr on multiple machines on Windows Server with Zookeeper ensemble? My
structure is as follows

1) 3 machines will have Zookeeper to create an ensemble
2) 2 of these machines will have Solr installed (with each having a replica
of other to provide high availability)

Any link/article that provides such a guide?

Regards,
Salman


Solr Date Format

2015-11-25 Thread Salman Ansari
Hi,

I was exploring Solr date formats and came across the following link
https://cwiki.apache.org/confluence/display/solr/Working+with+Dates

which specifies that the date format in Solr is as -MM-DDThh:mm:ssZ
I was wondering if

1) Solr support other date formats?
2) Solr supports other calendars such as Hijri calendar?

Regards,
Salman


Re: Solr Cloud and Multiple Indexes

2015-11-08 Thread Salman Ansari
Just to give you a context of what I am talking about, I am collecting data
from different sources (such as articles, videos etc.). Moreover, I will be
doing enrichment on the data such as Entity Extraction. From my previous
experiment with Solr what I was doing is dumping all articles, videos meta
data into a single index (distributed into multiple shards). Now that made
the whole query very slow. So for entity extraction, I created another
index on the same shards and pushed entities there. This actually made
querying entities very quick as there was very little data on that index
(although it was residing on the same machine as the main index).

Based on that quick experiment, I was thinking if I  need to use another
approach for my data. For example, instead of just relying on Solr Cloud to
distribute my data on different shards, why don't I create another index
for each type of data I have, such as articles, videos and then perform
some sort of distributed search over them. Will that be better in some
sense, such as performance?

Which version of solr are you using?
Currently, I am using Solr 5.3. btw, I could not find segment info link. Is
it under Core Admin?

Regards,
Salman


On Fri, Nov 6, 2015 at 7:26 AM, Modassar Ather <modather1...@gmail.com>
wrote:

> Thanks for your response. I have already gone through those documents
> before. My point was that if I am using Solr Cloud the only way to
> distribute my indexes is by adding shards? and I don't have to do anything
> manually (because all the distributed search is handled by Solr Cloud).
>
> Yes as per my knowledge.
>
> How do I check how many segments are there in the index?
> You can see into the index folder manually. Which version of solr are you
> using? I don't remember exactly the start version but in the latest and
> Solr-5.2.1 there is a "Segments info" link available where you can see
> number of segments.
>
> Regards,
> Modassar
>
> On Thu, Nov 5, 2015 at 5:41 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Thanks for your response. I have already gone through those documents
> > before. My point was that if I am using Solr Cloud the only way to
> > distribute my indexes is by adding shards? and I don't have to do
> anything
> > manually (because all the distributed search is handled by Solr Cloud).
> >
> > What is the Xms and Xmx you are allocating to Solr and how much max is
> > used by
> > your solr?
> > Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB
> >
> > How many segments are there in the index? The more the segment the slower
> > is
> > the search.
> > How do I check how many segments are there in the index?
> >
> > Is this after you moved to solrcloud?
> > I have been using SolrCloud from the beginning.
> >
> > Regards,
> > Salman
> >
> >
> > On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather <modather1...@gmail.com>
> > wrote:
> >
> > > SolrCloud makes the distributed search easier. You can find details
> about
> > > it under following link.
> > > https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
> > >
> > > You can also refer to following link:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> > >
> > > From size of your index I meant index size and not the total document
> > > alone.
> > > How many segments are there in the index? The more the segment the
> slower
> > > is the search.
> > > What is the Xms and Xmx you are allocating to Solr and how much max is
> > used
> > > by your solr?
> > >
> > > I doubt this as the slowness was happening for a long period of time.
> > > I mentioned this point as I have seen gc pauses of 30 seconds and more
> in
> > > some complex queries.
> > >
> > > I am facing delay of 2-3 seconds but previously I
> > > had delays of around 28 seconds.
> > > Is this after you moved to solrcloud?
> > >
> > > Regards,
> > > Modassar
> > >
> > >
> > > On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com
> >
> > > wrote:
> > >
> > > > Here is the current info
> > > >
> > > > How much memory is used?
> > > > Physical memory consumption: 5.48 GB out of 14 GB.
> > > > Swap space consumption: 5.83 GB out of 15.94 GB.
> > > > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> > > >
> > > > What is your index size?
> > > > I have around 

Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Hi,

I am using Solr cloud and I have created a single index that host around
70M documents distributed into 2 shards (each having 35M documents) and 2
replicas. The queries are very slow to run so I was thinking to distribute
the indexes into multiple indexes and consequently distributed search. Can
anyone guide me to some sources (articles) that discuss this in Solr Cloud?

Appreciate your feedback regarding this.

Regards,
Salman


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Here is the current info

How much memory is used?
Physical memory consumption: 5.48 GB out of 14 GB.
Swap space consumption: 5.83 GB out of 15.94 GB.
JVM-Memory consumption: 1.58 GB out of 3.83 GB.

What is your index size?
I have around 70M documents distributed on 2 shards (so each shard has 35M
document)

What type of queries are slow?
I am running normal queries (queries on a field) no faceting or highlights
are requested. Currently, I am facing delay of 2-3 seconds but previously I
had delays of around 28 seconds.

Are there GC pauses as they can be a cause of slowness?
I doubt this as the slowness was happening for a long period of time.

Are document updates/additions happening in parallel?
No, I have stopped adding/updating documents and doing queries only.

This is what you are already doing. Did you mean that you want to add more
shards?
No, what I meant is that I read that previously there was a way to chunk a
large index into multiple and then do distributed search on that as in this
article https://wiki.apache.org/solr/DistributedSearch. What I was looking
for how this is handled in Solr Cloud?


Regards,
Salman





On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com>
wrote:

> What is your index size? How much memory is used? What type of queries are
> slow?
> Are there GC pauses as they can be a cause of slowness?
> Are document updates/additions happening in parallel?
>
> The queries are very slow to run so I was thinking to distribute
> the indexes into multiple indexes and consequently distributed search. Can
> anyone guide me to some sources (articles) that discuss this in Solr Cloud?
>
> This is what you are already doing. Did you mean that you want to add more
> shards?
>
> Regards,
> Modassar
>
> On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I am using Solr cloud and I have created a single index that host around
> > 70M documents distributed into 2 shards (each having 35M documents) and 2
> > replicas. The queries are very slow to run so I was thinking to
> distribute
> > the indexes into multiple indexes and consequently distributed search.
> Can
> > anyone guide me to some sources (articles) that discuss this in Solr
> Cloud?
> >
> > Appreciate your feedback regarding this.
> >
> > Regards,
> > Salman
> >
>


Re: Solr Cloud and Multiple Indexes

2015-11-05 Thread Salman Ansari
Thanks for your response. I have already gone through those documents
before. My point was that if I am using Solr Cloud the only way to
distribute my indexes is by adding shards? and I don't have to do anything
manually (because all the distributed search is handled by Solr Cloud).

What is the Xms and Xmx you are allocating to Solr and how much max is used by
your solr?
Xms and Xmx are both 4G. My current JVM-Memory consumption is 1.58 GB

How many segments are there in the index? The more the segment the slower is
the search.
How do I check how many segments are there in the index?

Is this after you moved to solrcloud?
I have been using SolrCloud from the beginning.

Regards,
Salman


On Thu, Nov 5, 2015 at 1:21 PM, Modassar Ather <modather1...@gmail.com>
wrote:

> SolrCloud makes the distributed search easier. You can find details about
> it under following link.
> https://cwiki.apache.org/confluence/display/solr/How+SolrCloud+Works
>
> You can also refer to following link:
>
> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
> From size of your index I meant index size and not the total document
> alone.
> How many segments are there in the index? The more the segment the slower
> is the search.
> What is the Xms and Xmx you are allocating to Solr and how much max is used
> by your solr?
>
> I doubt this as the slowness was happening for a long period of time.
> I mentioned this point as I have seen gc pauses of 30 seconds and more in
> some complex queries.
>
> I am facing delay of 2-3 seconds but previously I
> had delays of around 28 seconds.
> Is this after you moved to solrcloud?
>
> Regards,
> Modassar
>
>
> On Thu, Nov 5, 2015 at 3:09 PM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Here is the current info
> >
> > How much memory is used?
> > Physical memory consumption: 5.48 GB out of 14 GB.
> > Swap space consumption: 5.83 GB out of 15.94 GB.
> > JVM-Memory consumption: 1.58 GB out of 3.83 GB.
> >
> > What is your index size?
> > I have around 70M documents distributed on 2 shards (so each shard has
> 35M
> > document)
> >
> > What type of queries are slow?
> > I am running normal queries (queries on a field) no faceting or
> highlights
> > are requested. Currently, I am facing delay of 2-3 seconds but
> previously I
> > had delays of around 28 seconds.
> >
> > Are there GC pauses as they can be a cause of slowness?
> > I doubt this as the slowness was happening for a long period of time.
> >
> > Are document updates/additions happening in parallel?
> > No, I have stopped adding/updating documents and doing queries only.
> >
> > This is what you are already doing. Did you mean that you want to add
> more
> > shards?
> > No, what I meant is that I read that previously there was a way to chunk
> a
> > large index into multiple and then do distributed search on that as in
> this
> > article https://wiki.apache.org/solr/DistributedSearch. What I was
> looking
> > for how this is handled in Solr Cloud?
> >
> >
> > Regards,
> > Salman
> >
> >
> >
> >
> >
> > On Thu, Nov 5, 2015 at 12:06 PM, Modassar Ather <modather1...@gmail.com>
> > wrote:
> >
> > > What is your index size? How much memory is used? What type of queries
> > are
> > > slow?
> > > Are there GC pauses as they can be a cause of slowness?
> > > Are document updates/additions happening in parallel?
> > >
> > > The queries are very slow to run so I was thinking to distribute
> > > the indexes into multiple indexes and consequently distributed search.
> > Can
> > > anyone guide me to some sources (articles) that discuss this in Solr
> > Cloud?
> > >
> > > This is what you are already doing. Did you mean that you want to add
> > more
> > > shards?
> > >
> > > Regards,
> > > Modassar
> > >
> > > On Thu, Nov 5, 2015 at 1:51 PM, Salman Ansari <salman.rah...@gmail.com
> >
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > I am using Solr cloud and I have created a single index that host
> > around
> > > > 70M documents distributed into 2 shards (each having 35M documents)
> > and 2
> > > > replicas. The queries are very slow to run so I was thinking to
> > > distribute
> > > > the indexes into multiple indexes and consequently distributed
> search.
> > > Can
> > > > anyone guide me to some sources (articles) that discuss this in Solr
> > > Cloud?
> > > >
> > > > Appreciate your feedback regarding this.
> > > >
> > > > Regards,
> > > > Salman
> > > >
> > >
> >
>


Re: Solr Features

2015-11-05 Thread Salman Ansari
Thanks Alex for your response. Much appreciated effort! For sure, I will
need to look for all those details and information to fully understand Solr
but I don't have that much time in my hand. That's why I was thinking
instead of reading everything from the beginning is to start with a feature
list that briefly explains what each feature does and then dig deeper if I
need more information. I will appreciate any comments/feedback regarding
this.

Regards,
Salman

On Thu, Nov 5, 2015 at 2:56 PM, Alexandre Rafalovitch <arafa...@gmail.com>
wrote:

> Well, I've started to answer, but it hit a nerve and turned into a
> guide. Which is now a blog post with 6 steps (not mentioning step 0 -
> Admitting you have a problem).
>
> I hope this is helpful:
> http://blog.outerthoughts.com/2015/11/learning-solr-comprehensively/
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 5 November 2015 at 01:08, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Hi,
> >
> > I am in the process of looking for a comprehensive list of Solr features
> in
> > order to assess how much have we implemented, what are some features that
> > we were unaware of that we can utilize etc. I have looked at the
> following
> > link for Solr features http://lucene.apache.org/solr/features.html but
> it
> > looks like it highlights the main features. I also looked at this page
> > http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
> > details and I am looking for more of such list and possibly a
> comprehensive
> > list that combines them all.
> >
> > Regards,
> > Salman
>


Solr Features

2015-11-04 Thread Salman Ansari
Hi,

I am in the process of looking for a comprehensive list of Solr features in
order to assess how much have we implemented, what are some features that
we were unaware of that we can utilize etc. I have looked at the following
link for Solr features http://lucene.apache.org/solr/features.html but it
looks like it highlights the main features. I also looked at this page
http://www.typo3-solr.com/en/what-is-solr/features/ which gives some
details and I am looking for more of such list and possibly a comprehensive
list that combines them all.

Regards,
Salman


Re: Solr Pagination

2015-10-28 Thread Salman Ansari
I have already indexed all the documents in Solr and not indexing anymore.
So the problem I am running in is after all the documents are indexed. I am
using Solr cloud with two shards and two replicas for each shard but on the
same machine. Is there anywhere I can look at the relation between index
size and machine specs and its effect on Solr query performance?

Regards,
Salman

On Mon, Oct 26, 2015 at 5:55 PM, Upayavira <u...@odoko.co.uk> wrote:

>
>
> On Sun, Oct 25, 2015, at 05:43 PM, Salman Ansari wrote:
> > Thanks guys for your responses.
> >
> > That's a very very large cache size.  It is likely to use a VERY large
> > amount of heap, and autowarming up to 4096 entries at commit time might
> > take many *minutes*.  Each filterCache entry is maxDoc/8 bytes.  On an
> > index core with 70 million documents, each filterCache entry is at least
> > 8.75 million bytes.  Multiply that by 16384, and a completely full cache
> > would need about 140GB of heap memory.  4096 entries will require 35GB.
> >  I don't think this cache is actually storing that many entries, or you
> > would most certainly be running into OutOfMemoryError exceptions.
> >
> > True, however, I have tried with the default filtercache at the beginning
> > but the problem was still there. So, I don't think that is how I should
> > increase the performance of my Solr. Moreover, as you mentioned, when I
> > change the configuration, I should be running out of memory but that did
> > not happen. Do you think my Solr has not taken the latest configs? I have
> > restarted the Solr btw.
> >
> > Lately I have been trying different ways to improve this and I have
> > created
> > a brand new index on the same machine using 2 shards and it had few
> > entries
> > (about 5) and the performance was booming, I got the results back in 42
> > ms
> > sometimes. What concerns me is that may be I am loading too much into one
> > index so that is why this is killing the performance. Is there a
> > recommended index size/document number and size that I should be looking
> > at
> > to tune this? Any other ideas other than increasing the memory size as I
> > have already tried this?
>
> The optimal index size is down to the size of segments on disk. New
> segments are created when hard commits occur, and existing on-disk
> segments may get merged in the background when the segment count gets
> too high. Now, if those on-disk segments get too large, copying them
> around at merge time can get prohibitive, especially if your index is
> changing frequently.
>
> Splitting such an index into shards is one approach to dealing with this
> issue.
>
> Upayavira
>


Re: Solr Pagination

2015-10-25 Thread Salman Ansari
Thanks guys for your responses.

That's a very very large cache size.  It is likely to use a VERY large
amount of heap, and autowarming up to 4096 entries at commit time might
take many *minutes*.  Each filterCache entry is maxDoc/8 bytes.  On an
index core with 70 million documents, each filterCache entry is at least
8.75 million bytes.  Multiply that by 16384, and a completely full cache
would need about 140GB of heap memory.  4096 entries will require 35GB.
 I don't think this cache is actually storing that many entries, or you
would most certainly be running into OutOfMemoryError exceptions.

True, however, I have tried with the default filtercache at the beginning
but the problem was still there. So, I don't think that is how I should
increase the performance of my Solr. Moreover, as you mentioned, when I
change the configuration, I should be running out of memory but that did
not happen. Do you think my Solr has not taken the latest configs? I have
restarted the Solr btw.

Lately I have been trying different ways to improve this and I have created
a brand new index on the same machine using 2 shards and it had few entries
(about 5) and the performance was booming, I got the results back in 42 ms
sometimes. What concerns me is that may be I am loading too much into one
index so that is why this is killing the performance. Is there a
recommended index size/document number and size that I should be looking at
to tune this? Any other ideas other than increasing the memory size as I
have already tried this?


Regards,
Salman

On Thu, Oct 22, 2015 at 9:18 AM, Toke Eskildsen 
wrote:

> On Wed, 2015-10-14 at 10:17 +0200, Jan Høydahl wrote:
> > I have not benchmarked various number of segments at different sizes
> > on different HW etc, so my hunch could very well be wrong for Salman’s
> case.
> > I don’t know how frequent updates there is to his data either.
> >
> > Have you done #segments benchmarking for your huge datasets?
>
> Only informally. However, the guys at UKWA run a similar scale index and
> have done multiple segment-count-oriented tests. They have not published
> a report, but there are measurements & graphs at
> https://github.com/ukwa/shine/tree/master/python/test-logs
>
> - Toke Eskildsen, State and University Library, Denmark
>
>
>


Analytics using Solr

2015-10-25 Thread Salman Ansari
Hi,

I was wondering if it is possible (and recommended) to run Analytics using
Solr. For example big data analytics. Any ideas?

Regards,
Salman


Re: Filtering on a Field with Suggestion

2015-10-16 Thread Salman Ansari
Thanks for pointing out as I am using Solr cloud 5.3. However, it looks
like they are talking about boolean operation in context field and not the
support of context field itself. Are you sure that context filtering is not
supported with any lookup prior to 5.4?
On Oct 16, 2015 12:47 PM, "Alessandro Benedetti" <abenede...@apache.org>
wrote:

> This will sound silly, but which version of Solr are you using ?
> According to :
> https://issues.apache.org/jira/browse/SOLR-7888
> This new cool feature will be included in solr 5.4 .
>
> Cheers
>
> On 15 October 2015 at 22:53, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > I am working with Solr suggester as explained in this article.
> > https://cwiki.apache.org/confluence/display/solr/Suggester
> >
> > The suggester is working fine but I want to filter the results based on a
> > filed (which is type). I have tried to follow what was written at the end
> > of the article (about Context Filtering) but still could not get the
> filter
> > working.
> >
> > My Solr configuration for suggestion is
> >
> >   
> > 
> > mySuggester
> > AnalyzingInfixLookupFactory
> > DocumentDictionaryFactory
> > entity_autocomplete
> > type
> > text_auto
> > false
> > 
> >   
> >
> > I have two entries for "Bill Gates" one with type=people and the other
> with
> > type=organization.
> >
> > I have tried the following query but still get both records for
> suggestion
> > (The right thing is to get one since I only have one Bill Gates as a type
> > of organization)
> >
> > Here is my query
> >
> > http://
> >
> >
> [MySolr]/[MyCollection]/suggest?suggest=true=true=mySuggester=Bill=people
> >
> > Any comments why this is not filtering?
> >
> > Regards,
> > Salman
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Filtering on a Field with Suggestion

2015-10-15 Thread Salman Ansari
Hi guys,

I am working with Solr suggester as explained in this article.
https://cwiki.apache.org/confluence/display/solr/Suggester

The suggester is working fine but I want to filter the results based on a
filed (which is type). I have tried to follow what was written at the end
of the article (about Context Filtering) but still could not get the filter
working.

My Solr configuration for suggestion is

  

mySuggester
AnalyzingInfixLookupFactory
DocumentDictionaryFactory
entity_autocomplete
type
text_auto
false

  

I have two entries for "Bill Gates" one with type=people and the other with
type=organization.

I have tried the following query but still get both records for suggestion
(The right thing is to get one since I only have one Bill Gates as a type
of organization)

Here is my query

http://
[MySolr]/[MyCollection]/suggest?suggest=true=true=mySuggester=Bill=people

Any comments why this is not filtering?

Regards,
Salman


Re: AutoComplete Feature in Solr

2015-10-14 Thread Salman Ansari
Actually what you mentioned Alessandro is something interesting for me. I
am looking to boost the ranking of some suggestions based on some dynamic
criteria (let's say how frequent they are used). Do I need to update the
boost field each time I request the suggestion (to capture the frequency)?
If you can direct me to an article that explains this with some scenarios
of using boost that would be appreciated.

Regards,
Salman


On Wed, Oct 14, 2015 at 11:49 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> using the suggester feature you can in some case rank the suggestions based
> on an additional numeric field.
> It's not your use case, you actually want to use a search handler with a
> well defined schema that will allow you for example to query on an edge
> ngram token filtered field, applying a geo distance boost function.
>
> This is what i would use and would work fine with your applied filter
> queries as well ( reducing the space of Suggestions)
>
> Cheers
>
> On 14 October 2015 at 05:09, William Bell <billnb...@gmail.com> wrote:
>
> > We want to use suggester but also want to show those results closest to
> my
> > lat,long... Kinda combine suggester and bq=geodist()
> >
> > On Mon, Oct 12, 2015 at 2:24 PM, Salman Ansari <salman.rah...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I have been trying to get the autocomplete feature in Solr working with
> > no
> > > luck up to now. First I read that "suggest component" is the
> recommended
> > > way as in the below article (and this is the exact functionality I am
> > > looking for, which is to autocomplete multiple words)
> > >
> > >
> >
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> > >
> > > Then I tried implementing suggest as described in the following
> articles
> > in
> > > this order
> > > 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
> > > 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
> > > implemented suggesting phrases)
> > > 3)
> > >
> > >
> >
> http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms
> > >
> > > With no luck, after implementing each article when I run my query as
> > > http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack
> > >
> > >
> > >
> > > I get
> > > 
> > > 
> > > 0
> > > 0
> > > 
> > > 
> > >
> > >  Although I have an entry for Barack Obama in my index. I am posting my
> > > Solr configuration as well
> > >
> > > 
> > >  
> > >   suggest
> > >name="classname">org.apache.solr.spelling.suggest.Suggester
> > >> > name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
> > >   entity_autocomplete
> > > true
> > >  
> > > 
> > >
> > >   > > class="org.apache.solr.handler.component.SearchHandler">
> > >  
> > >   true
> > >   suggest
> > >   10
> > > true
> > > false
> > >  
> > >  
> > >   suggest
> > >  
> > > 
> > >
> > > It looks like a very simple job, but even after following so many
> > articles,
> > > I could not get it right. Any comment will be appreciated!
> > >
> > > Regards,
> > > Salman
> > >
> >
> >
> >
> > --
> > Bill Bell
> > billnb...@gmail.com
> > cell 720-256-8076
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: AutoComplete Feature in Solr

2015-10-13 Thread Salman Ansari
Thanks guys, I was able to make it work using your articles. The key point
was mentioned in one of the articles which was that suggestion component is
preconfigured in techproducts sample. I started my work from there and
tweaked it to suit my needs. Thanks a lot!

One thing still remaining, I don't find the support for "suggest" is
Solr.NET. What I found is that we should use Spell check but that is not
the recommended option as per the articles. Spell Check component in
Solr.NET will use /spell component while I have configured suggestions
using /suggest component. It is easy to handle it myself as well but I was
just wondering if Solr.NET supports suggest component somehow.

Regards,
Salman

On Tue, Oct 13, 2015 at 2:39 PM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> As Erick suggested you are reading a really old way to provide the
> autocomplete feature !
> Please take a read to the docs Erick linked and to my blog as well.
> It will definitely give you more insight about the Autocomplete world !
>
> Cheers
>
> [1] http://alexbenedetti.blogspot.co.uk/2015/07/solr-you-complete-me.html
>
> On 12 October 2015 at 21:24, Salman Ansari <salman.rah...@gmail.com>
> wrote:
>
> > Hi,
> >
> > I have been trying to get the autocomplete feature in Solr working with
> no
> > luck up to now. First I read that "suggest component" is the recommended
> > way as in the below article (and this is the exact functionality I am
> > looking for, which is to autocomplete multiple words)
> >
> >
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> >
> > Then I tried implementing suggest as described in the following articles
> in
> > this order
> > 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
> > 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
> > implemented suggesting phrases)
> > 3)
> >
> >
> http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms
> >
> > With no luck, after implementing each article when I run my query as
> > http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack
> >
> >
> >
> > I get
> > 
> > 
> > 0
> > 0
> > 
> > 
> >
> >  Although I have an entry for Barack Obama in my index. I am posting my
> > Solr configuration as well
> >
> > 
> >  
> >   suggest
> >   org.apache.solr.spelling.suggest.Suggester
> >> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
> >   entity_autocomplete
> > true
> >  
> > 
> >
> >   > class="org.apache.solr.handler.component.SearchHandler">
> >  
> >   true
> >   suggest
> >   10
> > true
> > false
> >  
> >  
> >   suggest
> >  
> > 
> >
> > It looks like a very simple job, but even after following so many
> articles,
> > I could not get it right. Any comment will be appreciated!
> >
> > Regards,
> > Salman
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


AutoComplete Feature in Solr

2015-10-12 Thread Salman Ansari
Hi,

I have been trying to get the autocomplete feature in Solr working with no
luck up to now. First I read that "suggest component" is the recommended
way as in the below article (and this is the exact functionality I am
looking for, which is to autocomplete multiple words)
http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/

Then I tried implementing suggest as described in the following articles in
this order
1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
implemented suggesting phrases)
3)
http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms

With no luck, after implementing each article when I run my query as
http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack



I get


0
0



 Although I have an entry for Barack Obama in my index. I am posting my
Solr configuration as well


 
  suggest
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.FSTLookup
  entity_autocomplete
true
 


 
 
  true
  suggest
  10
true
false
 
 
  suggest
 


It looks like a very simple job, but even after following so many articles,
I could not get it right. Any comment will be appreciated!

Regards,
Salman


Re: Solr Pagination

2015-10-10 Thread Salman Ansari
Regarding Solr performance issue I was facing, I upgraded my Solr machine
to have
8 cores
56 GB RAM
8 GB JVM

However, unfortunately, I am still getting delays. I have run

* the query "Football" with start=0 and rows=10 and it took around 7.329
seconds
* the query "Football" with start=1000 and rows=10 and it took around
21.994 seconds

I was looking at Solr admin that the RAM and JVM are not being utilized to
the maximum, even not half or 1/4th. How do I push data to the cache once
Solr starts? and is pushing data to cache the right strategy to solve the
issue?

Appreciate your comments.

Regards,
Salman



On Sat, Oct 10, 2015 at 11:55 AM, Salman Ansari <salman.rah...@gmail.com>
wrote:

> Thanks Shawn for your response. Based on that
> 1) Can you please direct me where I can get more information about cold
> shard vs hot shard?
>
> 2)  That 10GB number assumes there's no other software on the machine,
> like a database server or a webserver.
> Yes the machine is dedicated for Solr
>
> 3) How much index data is on the machine?
> I have 3 collections 2 for testing (so the aggregate of both of them does
> not exceed 1M document) and the main collection that I am querying now
> which contains around 69M. I have distributed all my collections into 2
> shards each with 2 replicas. The consumption on the hard disk is about 40GB.
>
> 4) A memory size of 14GB would be unusual for a physical machine, and
> makes me wonder if you're using virtual machines
> Yes I am using virtual machine as using a bare metal will be difficult in
> my case as all of our data center is on the cloud. I can increase its
> capacity though. While testing some edge cases on Solr, I realized on Solr
> admin that the memory sometimes reaches to its limit (14GB RAM, and 4GB JVM)
>
> 5) Just to confirm, I have combined the lessons from
>
> http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
> AND
> https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache
>
> to come up with the following settings
>
> FilterCache
>
>   size="16384"
>  initialSize="4096"
>  autowarmCount="4096"/>
>
> DocummentCahce
>
> size="16384"
>initialSize="16384"
>autowarmCount="0"/>
>
> NewSearcher and FirsSearcher
>
> 
>   
>*score desc id
> desc
>   
> 
> 
>   
>  * score desc id desc
> 
> 
>      *
>   category
>   
> 
>
> Will this be using more cache in Solr and prepoupulate it?
>
> Regards,
> Salman
>
>
>
>
> On Sat, Oct 10, 2015 at 5:10 AM, Shawn Heisey <apa...@elyograg.org> wrote:
>
>> On 10/9/2015 1:39 PM, Salman Ansari wrote:
>>
>> > INFO  - 2015-10-09 18:46:17.953; [c:sabr102 s:shard1 r:core_node2
>> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
>> > [sabr102_shard1_replica1] webapp=/solr path=/select
>> > params={start=0=(content_text:Football)=10} hits=24408 status=0
>> > QTime=3391
>>
>> Over 3 seconds for a query like this definitely sounds like there's a
>> problem.
>>
>> > INFO  - 2015-10-09 18:47:04.727; [c:sabr102 s:shard1 r:core_node2
>> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
>> > [sabr102_shard1_replica1] webapp=/solr path=/select
>> > params={start=1000=(content_text:Football)=10} hits=24408
>> status=0
>> > QTime=21569
>>
>> Adding a start value of 1000 increases QTime by a factor of more than
>> 6?  Even more evidence of a performance problem.
>>
>> For comparison purposes, I did a couple of simple queries on a large
>> index of mine.  Here are the response headers showing the QTime value
>> and all the parameters (except my shard URLs) for each query:
>>
>>   "responseHeader": {
>> "status": 0,
>> "QTime": 1253,
>> "params": {
>>   "df": "catchall",
>>   "spellcheck.maxCollationEvaluations": "2",
>>   "spellcheck.dictionary": "default",
>>   "echoParams": "all",
>>   "spellcheck.maxCollations": "5",
>>   "q.op": "AND",
>>   "shards.info": "true",
>>   "spellcheck.maxCollationTries": "2",
>>   "rows": "70"

Re: Solr Pagination

2015-10-10 Thread Salman Ansari
Thanks Shawn for your response. Based on that
1) Can you please direct me where I can get more information about cold
shard vs hot shard?

2)  That 10GB number assumes there's no other software on the machine, like
a database server or a webserver.
Yes the machine is dedicated for Solr

3) How much index data is on the machine?
I have 3 collections 2 for testing (so the aggregate of both of them does
not exceed 1M document) and the main collection that I am querying now
which contains around 69M. I have distributed all my collections into 2
shards each with 2 replicas. The consumption on the hard disk is about 40GB.

4) A memory size of 14GB would be unusual for a physical machine, and makes me
wonder if you're using virtual machines
Yes I am using virtual machine as using a bare metal will be difficult in
my case as all of our data center is on the cloud. I can increase its
capacity though. While testing some edge cases on Solr, I realized on Solr
admin that the memory sometimes reaches to its limit (14GB RAM, and 4GB JVM)

5) Just to confirm, I have combined the lessons from

http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
AND
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

to come up with the following settings

FilterCache



DocummentCahce



NewSearcher and FirsSearcher


  
   *score desc id
desc
  


  
 * score desc id desc 

 *
  category
  


Will this be using more cache in Solr and prepoupulate it?

Regards,
Salman




On Sat, Oct 10, 2015 at 5:10 AM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 10/9/2015 1:39 PM, Salman Ansari wrote:
>
> > INFO  - 2015-10-09 18:46:17.953; [c:sabr102 s:shard1 r:core_node2
> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
> > [sabr102_shard1_replica1] webapp=/solr path=/select
> > params={start=0=(content_text:Football)=10} hits=24408 status=0
> > QTime=3391
>
> Over 3 seconds for a query like this definitely sounds like there's a
> problem.
>
> > INFO  - 2015-10-09 18:47:04.727; [c:sabr102 s:shard1 r:core_node2
> > x:sabr102_shard1_replica1] org.apache.solr.core.SolrCore;
> > [sabr102_shard1_replica1] webapp=/solr path=/select
> > params={start=1000=(content_text:Football)=10} hits=24408 status=0
> > QTime=21569
>
> Adding a start value of 1000 increases QTime by a factor of more than
> 6?  Even more evidence of a performance problem.
>
> For comparison purposes, I did a couple of simple queries on a large
> index of mine.  Here are the response headers showing the QTime value
> and all the parameters (except my shard URLs) for each query:
>
>   "responseHeader": {
> "status": 0,
> "QTime": 1253,
> "params": {
>   "df": "catchall",
>   "spellcheck.maxCollationEvaluations": "2",
>   "spellcheck.dictionary": "default",
>   "echoParams": "all",
>   "spellcheck.maxCollations": "5",
>   "q.op": "AND",
>   "shards.info": "true",
>   "spellcheck.maxCollationTries": "2",
>   "rows": "70",
>   "spellcheck.extendedResults": "false",
>   "shards": "REDACTED SEVEN SHARD URLS",
>   "shards.tolerant": "true",
>   "spellcheck.onlyMorePopular": "false",
>   "facet.method": "enum",
>   "spellcheck.count": "9",
>   "q": "catchall:carriage",
>   "indent": "true",
>   "wt": "json",
>   "_": "120900498"
> }
>
>
>   "responseHeader": {
> "status": 0,
> "QTime": 176,
> "params": {
>   "df": "catchall",
>   "spellcheck.maxCollationEvaluations": "2",
>   "spellcheck.dictionary": "default",
>   "echoParams": "all",
>   "spellcheck.maxCollations": "5",
>   "q.op": "AND",
>   "shards.info": "true",
>   "spellcheck.maxCollationTries": "2",
>   "rows": "70",
>   "spellcheck.extendedResults": "false",
>   "shards": "REDACTED SEVEN SHARD URLS",
>   "shards.tolerant": "true",
>   "spellcheck.onlyMorePopular"

Re: Solr Pagination

2015-10-09 Thread Salman Ansari
Thanks Eric for your response. If you find pagination is not the main
culprit, what other factors do you guys suggest I need to tweak to test
that? As I mentioned, by navigating to 2 results using start and row I
am getting time out from Solr.NET and I need a way to fix that.

You suggested that 4GB JVM is not enough, I have seen MapQuest going with
10GB JVM as mentioned here
http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
and they were getting 140 ms response time for 10 Billion documents. Not
sure how many shards they had though. With data of around 70M documents,
what do you guys suggest as how many shards should I use and how much
should I dedicate for RAM and JVM?

Regards,
Salman

On Fri, Oct 9, 2015 at 6:37 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> I think paging is something of a red herring. You say:
>
> bq: but still I get delays of around 16 seconds and sometimes even more.
>
> Even for a start of 1,000, this is ridiculously long for Solr. All
> you're really saving
> here is keeping a record of the id and score for a list 1,000 cells
> long (or even
> 20,000 assuming 1,000 pages and 20 docs/page). that's somewhat wasteful,
> but it's still hard to believe it's responsible for what you're seeing.
>
> Having 4G of RAM for 70M docs is very little memory, assuming this is on
> a single shard.
>
> So my suspicion is that you have something fundamentally slow about
> your system, the additional overhead shouldn't be as large as you're
> reporting.
>
> And I'll second Toke's comment. It's very rare that users see anything
> _useful_ by navigating that deep. Make them hit next next next and they'll
> tire out way before that.
>
> Cursor mark's sweet spot is handling some kind of automated process that
> goes through the whole result set. It'll work for what you're trying
> to do though.
>
> Best,
> Erick
>
> On Fri, Oct 9, 2015 at 8:27 AM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Is this a real problem or a worry? Do you have users that page really
> deep
> > and if so, have you considered other mechanisms for delivering what they
> > need?
> >
> > The issue is that currently I have around 70M documents and some generic
> > queries are resulting in lots of pages. Now if I try deep navigation (to
> > page# 1000 for example), a lot of times the query takes so long that
> > Solr.NET throws operation time out exception. The first page is
> relatively
> > faster to load but it does take around few seconds as well. After reading
> > some documentation I realized that cursors could help and it does. I have
> > tried to following the test better performance:
> >
> > 1) Used cursors instead of start and row
> > 2) Increased the RAM on my Solr machine to 14GB
> > 3) Increase the JVM on that machine to 4GB
> > 4) Increased the filterChache
> > 5) Increased the docCache
> > 6) Run Optimize on the Solr Admin
> >
> > but still I get delays of around 16 seconds and sometimes even more.
> > What other mechanisms do you suggest I should use to handle this issue?
> >
> > While pagination is faster than increasing the start parameter, the
> > difference is small as long as you stay below a start of 1000. 10K might
> > also work for you. Do your users page beyond that?
> > I can limit users not to go beyond 10K but still think at that level
> > cursors will be much faster than increasing the start variable as
> explained
> > here (
> https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
> > ), have you tried both ways on your collection and it was giving you
> > similar results?
> >
> > On Fri, Oct 9, 2015 at 5:20 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
> > wrote:
> >
> >> Salman Ansari <salman.rah...@gmail.com> wrote:
> >>
> >> [Pagination with cursors]
> >>
> >> > For example, what happens if the user navigates from page 1 to page 2,
> >> > does the front end  need to store the next cursor at each query?
> >>
> >> Yes.
> >>
> >> > What about going to a previous page, do we need to store all cursors
> >> > that have been navigated up to now at the client side?
> >>
> >> Yes, if you want to provide that functionality.
> >>
> >> Is this a real problem or a worry? Do you have users that page really
> deep
> >> and if so, have you considered other mechanisms for delivering what they
> >> need?
> >>
> >> While pagination is faster than increasing the start parameter, the
> >> difference is small as long as you stay below a start of 1000. 10K might
> >> also work for you. Do your users page beyond that?
> >>
> >> - Toke Eskildsen
> >>
>


Solr Pagination

2015-10-09 Thread Salman Ansari
Hi guys,

I have been working with Solr and Solr.NET for some time for a big project
that requires around 300M documents. Consequently, I faced an issue and I
am highlighting it here in case you have any comments:

As mentioned here (
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results),
cursors are introduced to solve the problem of pagination. However, I was
not able to find an example to do proper handling of page navigation with
multiple users. For example, what happens if the user navigates from page 1
to page 2, does the front end  need to store the next cursor at each query?
What about going to a previous page, do we need to store all cursors that
have been navigated up to now at the client side? Any comments/sample on
how proper pagination should be handled using cursors?

Regards,
Salman


Re: Solr Pagination

2015-10-09 Thread Salman Ansari
I agree 10B will not be residing on the same machine :)

About the other issue you raised, while submitting the query to Solr I was
keeping a close eye on RAM and JVM consumption on Solr Admin and for
queries at the beginning that were taking most of the time, neither RAM nor
JVM was hitting the limit so I doubt that is the problem. For reference, I
did have an issue with JVM raising an exception of "Out of Memory" when it
was around 500MB but then I raised the machine capacity to 14GB RAM and 4GB
JVM.  I have read here (
https://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache) that
for best performance I should be able to put my entire collection in
memory. Does that sound reasonable?

As for the logs, I searched for "Salman" with rows=10 and start=1000 and it
took about 29 seconds to complete. However, it took less at each shard as
shown in the log file

INFO  - 2015-10-09 16:43:39.170; [c:sabr102 s:shard1 r:core_node4
x:sabr102_shard1_replica2] org.apache.solr.core.SolrCore;
[sabr102_shard1_replica2] webapp=/solr path=/select
params={distrib=false=javabin=2=1010=text=id=score=http://
[MySolrIP]:8983/solr/sabr102_shard1_replica1/|
http://100.114.184.37:7574/solr/sabr102_shard1_replica2/=109019061=0=4=(content_text:Salman)=true=true=false}
hits=1819 status=0 QTime=91

INFO  - 2015-10-09 16:44:08.116; [c:sabr102 s:shard1 r:core_node4
x:sabr102_shard1_replica2] org.apache.solr.core.SolrCore;
[sabr102_shard1_replica2] webapp=/solr path=/select
params={ids=584673511333089281,584680513887010816,584697461744111616,584668540118044672,583299685516984320=false=javabin=2=10=text=
http://100.114.184.37:8983/solr/sabr102_shard1_replica1/|http://[MySolrIP]:7574/solr/sabr102_shard1_replica2/=109019061=1000=64=(content_text:Salman)=true=false}
status=0 QTime=4

the search in the second shard started AFTER 29 seconds. Any logic behind
what I am seeing here?

Moreover, I do understand that everyone's need is different and I do need
to prototype, but there must be strategies to follow even when prototyping,
that is what I am looking forward to hear from you and the community. My
concurrent users are not that much, but I do have a good amount of data to
be stored/indexed in Solr and even if one user is not able to execute
queries quite efficiently, that will be problematic.

Regards,
Salman

On Fri, Oct 9, 2015 at 7:06 PM, Erick Erickson <erickerick...@gmail.com>
wrote:

> bq: 10GB JVM as mentioned here...and they were getting 140 ms response
> time for 10 Billion documents
>
> This simply could _not_ work in a single shard as there's a hard 2B
> doc limit per shard. On slide 14
> it states "both collections are sharded". They are not fitting 10B
> docs in 10G of JVM on a single
> machine. Trust me on this ;). The slides do not state how many shards
> they've
> split their collection into, but I suspect it's a bunch. Each
> application is different enough that the
> numbers wouldn't translate anyway...
>
> 70M docs can fit on a single shard with quite good response time, but
> YMMV. You simply
> have to experiment. Here's a long blog on the subject:
>
> https://lucidworks.com/blog/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Start with a profiler and see where you're spending your time. My
> first guess is that
> you're spending a lot of CPU cycles in garbage collection. This
> sometimes happens
> when you are running near your JVM limit, a GC kicks in and recovers a
> tiny bit of memory
> and then initiates another GC cycle immediately. Turn on GC logging
> and take a look
> at the stats provided, see:
> https://lucidworks.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/
>
> Tens of seconds is entirely unexpected though. Do the Solr logs point
> to anything happening?
>
> Best,
> Erick
>
> On Fri, Oct 9, 2015 at 8:51 AM, Salman Ansari <salman.rah...@gmail.com>
> wrote:
> > Thanks Eric for your response. If you find pagination is not the main
> > culprit, what other factors do you guys suggest I need to tweak to test
> > that? As I mentioned, by navigating to 2 results using start and row
> I
> > am getting time out from Solr.NET and I need a way to fix that.
> >
> > You suggested that 4GB JVM is not enough, I have seen MapQuest going with
> > 10GB JVM as mentioned here
> >
> http://www.slideshare.net/lucidworks/high-performance-solr-and-jvm-tuning-strategies-used-for-map-quests-search-ahead-darren-spehr
> > and they were getting 140 ms response time for 10 Billion documents. Not
> > sure how many shards they had though. With data of around 70M documents,
> > what do you guys suggest as how many shards should I use and how much
> > should I dedicate for RAM and JVM?
> >
> > Regards,
> > Salman
> >
>

Re: Solr Pagination

2015-10-09 Thread Salman Ansari
e total response is 29 seconds. I do note that one of your
> queries has rows=1010, a typo?
>
> Anyway, not at all sure what's going on here. If these are gigantic files
> you're
> returning, then it could be decompressing time, unlikely but possible.
>
> Try again with rows=0=1000 to see if it's something weird with
> getting
> the stored data, but that's highly doubtful.
>
> I think the only real way to get to the bottom of it will be to slap a
> profiler
> on it and see where the time is being spent.
>
> Best,
> Erick
>
> On Fri, Oct 9, 2015 at 9:53 AM, Toke Eskildsen <t...@statsbiblioteket.dk>
> wrote:
> > Salman Ansari <salman.rah...@gmail.com> wrote:
> >> Thanks Eric for your response. If you find pagination is not the main
> >> culprit, what other factors do you guys suggest I need to tweak to test
> >> that?
> >
> > Well, is basic search slow? What are your response times for plain
> un-warmed top-20 searches?
> >
> >> As I mentioned, by navigating to 2 results using start and row I
> >> am getting time out from Solr.NET and I need a way to fix that.
> >
> > You still haven't answered my question: Do your users actually need to
> page that far?
> >
> >
> > Again: I know there can be 10 million results. Why would your users need
> to page through all of them? Why would they need to page trough just the
> first 1000? What are they trying to achieve?
> >
> > If they used it automatically for full export of the result set, then I
> can understand it, but you talked about next & previous page, which
> indicates that this is a manual process. A manual process that requires
> clicking next 1000 times is a severe indicator that something can be done
> differently.
> >
> > - Toke Eskildsen
>


Re: Solr Pagination

2015-10-09 Thread Salman Ansari
Is this a real problem or a worry? Do you have users that page really deep
and if so, have you considered other mechanisms for delivering what they
need?

The issue is that currently I have around 70M documents and some generic
queries are resulting in lots of pages. Now if I try deep navigation (to
page# 1000 for example), a lot of times the query takes so long that
Solr.NET throws operation time out exception. The first page is relatively
faster to load but it does take around few seconds as well. After reading
some documentation I realized that cursors could help and it does. I have
tried to following the test better performance:

1) Used cursors instead of start and row
2) Increased the RAM on my Solr machine to 14GB
3) Increase the JVM on that machine to 4GB
4) Increased the filterChache
5) Increased the docCache
6) Run Optimize on the Solr Admin

but still I get delays of around 16 seconds and sometimes even more.
What other mechanisms do you suggest I should use to handle this issue?

While pagination is faster than increasing the start parameter, the
difference is small as long as you stay below a start of 1000. 10K might
also work for you. Do your users page beyond that?
I can limit users not to go beyond 10K but still think at that level
cursors will be much faster than increasing the start variable as explained
here (https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results
), have you tried both ways on your collection and it was giving you
similar results?

On Fri, Oct 9, 2015 at 5:20 PM, Toke Eskildsen <t...@statsbiblioteket.dk>
wrote:

> Salman Ansari <salman.rah...@gmail.com> wrote:
>
> [Pagination with cursors]
>
> > For example, what happens if the user navigates from page 1 to page 2,
> > does the front end  need to store the next cursor at each query?
>
> Yes.
>
> > What about going to a previous page, do we need to store all cursors
> > that have been navigated up to now at the client side?
>
> Yes, if you want to provide that functionality.
>
> Is this a real problem or a worry? Do you have users that page really deep
> and if so, have you considered other mechanisms for delivering what they
> need?
>
> While pagination is faster than increasing the start parameter, the
> difference is small as long as you stay below a start of 1000. 10K might
> also work for you. Do your users page beyond that?
>
> - Toke Eskildsen
>