Block Join Faceting in Solr 7.2

2018-08-06 Thread Aditya Gandhi
I'm querying an Index which has two types of child documents (let's call
them ChildTypeA and ChildTypeB)
I wrap the subqueries for each of these documents in a boolean clause,
something like this:

*q=+{! parent which=type:parent } +{! parent
which=type:parent }*


I've been trying to get facet counts on documents of ChildTypeA (rolled up
by parent) and I've tried the following approaches


   - Tried Block Join Faceting using the JSON API  i.e. using the
   unique(_root_) approach.
  -  Enabled docValues on _root_
  - *This did not scale well*
   - Tried using the BlockJoinFacet component.
  - Had to customize it since it expects that only one
  *ToParentBlockJoinQuery* clause to be present in the query.
  - Since I needed facet counts only on ChildTypeA, I changed it to
  ignore the clause on ChildTypeB
  - I did not enable docValues on _root_ since it was not mentioned in
  the documentation.
  - *This approach did not scale well*

I needed advice on whether I could have done anything better in any one of
the two approached I've tried so far. Also if there exists some other
approached I could try.
Would using the uniqueBlock in 7.4 help? (Though this would require me to
upgrade my Solr version)


Solr client

2017-08-28 Thread Aditya
Hi

I am aggregating open source solr client libraries across all languages.
Below are the links. Very few projects are currently active. Most of them
are last updated few years back. Please provide me pointers, if i missed
any solr client library.

http://www.findbestopensource.com/tagged/solr-client
http://www.findbestopensource.com/tagged/solr-gui


Regards
Ganesh

PS: The website http://www.findbestopensource.com search is powered by Solr.


Re: Multilevel grouping?

2016-07-14 Thread Aditya Sundaram
Thanks Yonik, was looking for exactly that, is there any workaround to
achieve that currently?

On Tue, Jul 12, 2016 at 5:07 PM, Yonik Seeley  wrote:

> I started this a while ago, but haven't found the time to finish:
> https://issues.apache.org/jira/browse/SOLR-7830
>
> -Yonik
>
>
> On Tue, Jul 12, 2016 at 7:29 AM, Aditya Sundaram
>  wrote:
> > Does solr support multilevel grouping? I want to group upto 2/3 levels
> > based on different fields i.e 1st group on field one, within which i
> group
> > by field 2 etc.
> > I am aware of facet.pivot which does the same but retrieves only the
> count.
> > Is there anyway to get the documents as well along with the count in
> > facet.pivot???
> >
> > --
> > Aditya Sundaram
>



-- 
Aditya Sundaram
Software Engineer, Technology team
AKR Tech park B Block, B1 047
+91-9844006866


Multilevel grouping?

2016-07-12 Thread Aditya Sundaram
Does solr support multilevel grouping? I want to group upto 2/3 levels
based on different fields i.e 1st group on field one, within which i group
by field 2 etc.
I am aware of facet.pivot which does the same but retrieves only the count.
Is there anyway to get the documents as well along with the count in
facet.pivot???

-- 
Aditya Sundaram


Re: Same origin policy for Apache Solr 5.5

2016-04-04 Thread Aditya Desai
Hello Upayavira

I am trying to build an application to get the data from independent stand
alone SOLR4.10 and then parse that data on global map. So effectively there
are
two SOLRs, one is independent(4.10) and the other one is having Map
APIs(SOLR 5.10 here). I want to give customers the my entire SOLR5.5
package and they
just need to put the collections present in any SOLR(here SOLR 4.10). Does
this help?

On Mon, Apr 4, 2016 at 9:11 AM, Upayavira  wrote:

> Why would you want to do this?
>
> On Sun, 3 Apr 2016, at 04:15 AM, Aditya Desai wrote:
> > Hello SOLR Experts
> >
> > I am interested to know if SOLR 5.5 supports Same Origin Policy. I am
> > trying to read the data from
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_Solr-5F1_my_directory1&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=OZNbFMIY0w8PkqNE-rdtJ1_HXYKHVV14O9xOQHeLaTg&e=
> > and
> > display it on UI on
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983_Solr-5F2_my_directory2&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=jbb1jIDNQ5S-5WilIQjNWPWj6odAi1Dw76aUZEeEsR8&e=
> .
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8983&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=Qw1EoPcAPdhlW4lJ7QH1P2CcL--41WTsAPqBaGuqzmQ&e=
> has Solr 4.10 running and
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=VOLsCmWLyGadKpEldVW2r4VDXnfaJsYQGvUZlXAwPF8&e=
> has
> > Solr 5.5 running.I am using Javascript to make XMLHTTP request, but it is
> > failing with NS_ERROR. So I doubt SOLR supports same origin policy.
> >
> > Is this possible? Any suggestion on how to achieve this?
> >
> > Thanks in advance
> >
> > --
> > Aditya Ramachandra Desai
> > MS Computer Science Graduate Student
> > USC Viterbi School of Engineering
> > Los Angeles, CA 90007
> > M : +1-415-463-9864 | L :
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_adityardesai&d=CwICAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=YI-5lkYG_20oFkV3OYnMntLrJtskVjWY2d7zJBcsfpo&s=ZYioUPYMkaBFyqZkefbXTCv8WpOtY-i-yf63sTnQMsg&e=
>



-- 
Aditya Ramachandra Desai
MS Computer Science Graduate Student
USC Viterbi School of Engineering
Los Angeles, CA 90007
M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai


Same origin policy for Apache Solr 5.5

2016-04-02 Thread Aditya Desai
Hello SOLR Experts

I am interested to know if SOLR 5.5 supports Same Origin Policy. I am
trying to read the data from http://localhost:8984/Solr_1/my/directory1 and
display it on UI on http://localhost:8983/Solr_2/my/directory2.

http://localhost:8983 has Solr 4.10 running and http://localhost:8984 has
Solr 5.5 running.I am using Javascript to make XMLHTTP request, but it is
failing with NS_ERROR. So I doubt SOLR supports same origin policy.

Is this possible? Any suggestion on how to achieve this?

Thanks in advance

-- 
Aditya Ramachandra Desai
MS Computer Science Graduate Student
USC Viterbi School of Engineering
Los Angeles, CA 90007
M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai


Re: Regarding JSON indexing in SOLR 4.10

2016-03-30 Thread Aditya Desai
Hi Erick

Thanks for your email. Here is the attached sample JSON file. When I
indexed the same JSON file with SOLR 5.5 using bin/post it indexed
successfully. Also all of my documents were indexed successfully with 5.5
and not with 4.10.

Regards

On Wed, Mar 30, 2016 at 3:13 PM, Erick Erickson 
wrote:

> The document you're sending to Solr doesn't have an "id" field. The
> copyField directive has
> nothing to do with it. And you copyField would be copying _from_ the
> id field _to_ the
> Keyword field, is that what you intended?
>
> Even if the source and dest fields were reversed, it still wouldn't
> work since there is no id
> field as indicated by the error.
>
> Let's see one of the json files please? Are they carefully-formulated
> or arbitrary files? If
> carefully formulated, just switch
>
> Best,
> Erick
>
> On Wed, Mar 30, 2016 at 11:26 AM, Aditya Desai  wrote:
> > Hi Paul
> >
> > Thanks a lot for your help! I have one small question, I have schema that
> > includes {Keyword,id,currency,geographic_name}. Now I have given
> > id
> > And
> > 
> > Whenever I am running your script I am getting an error as
> >
> > 
> > 400 > name="QTime">2Document is
> > missing mandatory uniqueKey field: id name="code">400
> > 
> >
> > Can you please share your expertise advice here. Can you please guide me
> a
> > good source to learn SOLR?
> >
> > I am learning and I would really appreciate if you can help me.
> >
> > Regards
> >
> >
> > On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman  wrote:
> >
> >> On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote:
> >> > I am running SOLR 4.10 on port 8984 by changing the default port in
> >> > etc/jetty.xml. I am now trying to index all my JSON files to Solr
> running
> >> > on 8984. The following is the command
> >> >
> >> > curl '
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e=
> >> ' --data-binary *.json
> >> > -H 'Content-type:application/json'
> >>
> >> The wildcard is the problem; your shell is expanding --data-binary
> >> *.json to --data-binary foo.json bar.json baz.json and curl doesn't know
> >> how to download bar.json and baz.json.
> >>
> >> Try this instead:
> >>
> >> for file in *.json; do
> >> curl '
> >>
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e=
> >> ' --data-binary "$file" -H 'Content-type:application/json'
> >> done
> >>
> >> Paul.
> >>
> >> --
> >> Paul Hoffman 
> >> Systems Librarian
> >> Fenway Libraries Online
> >> c/o Wentworth Institute of Technology
> >> 550 Huntington Ave.
> >> Boston, MA 02115
> >> (617) 442-2384 (FLO main number)
> >>
> >
> >
> >
> > --
> > Aditya Ramachandra Desai
> > MS Computer Science Graduate Student
> > USC Viterbi School of Engineering
> > Los Angeles, CA 90007
> > M : +1-415-463-9864 | L :
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_in_adityardesai&d=CwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=ihbpCZYoNmoSqzckKlY5lkESOZXPuLtNIGjnLZCzj78&s=YD-dm-5blmQ07_4vYFoLz6r0NqKRNK1aHtIgHUvc48U&e=
>



-- 
Aditya Ramachandra Desai
MS Computer Science Graduate Student
USC Viterbi School of Engineering
Los Angeles, CA 90007
M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai


0A0B69C000E730AE9A1F08E6D7442CC0FB94FC0512624704D06EB48E03C49E16_Output.json
Description: application/json


Re: Regarding JSON indexing in SOLR 4.10

2016-03-30 Thread Aditya Desai
Hi Paul

Thanks a lot for your help! I have one small question, I have schema that
includes {Keyword,id,currency,geographic_name}. Now I have given
id
And

Whenever I am running your script I am getting an error as


4002Document is
missing mandatory uniqueKey field: id400


Can you please share your expertise advice here. Can you please guide me a
good source to learn SOLR?

I am learning and I would really appreciate if you can help me.

Regards


On Wed, Mar 30, 2016 at 6:55 AM, Paul Hoffman  wrote:

> On Tue, Mar 29, 2016 at 11:30:06PM -0700, Aditya Desai wrote:
> > I am running SOLR 4.10 on port 8984 by changing the default port in
> > etc/jetty.xml. I am now trying to index all my JSON files to Solr running
> > on 8984. The following is the command
> >
> > curl '
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e=
> ' --data-binary *.json
> > -H 'Content-type:application/json'
>
> The wildcard is the problem; your shell is expanding --data-binary
> *.json to --data-binary foo.json bar.json baz.json and curl doesn't know
> how to download bar.json and baz.json.
>
> Try this instead:
>
> for file in *.json; do
> curl '
> https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8984_solr_update-3Fcommit-3Dtrue&d=CwIBAg&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=aLfk1zsmx4LG4nTElFRiaw&m=7B13rM0e1iuqzbXK9vK6b5luu5je3SpeGunT1bf-MWA&s=R9qSptMrt9o6C0BXmeQdtm3_bx4fFbABYFja2XUFylA&e=
> ' --data-binary "$file" -H 'Content-type:application/json'
> done
>
> Paul.
>
> --
> Paul Hoffman 
> Systems Librarian
> Fenway Libraries Online
> c/o Wentworth Institute of Technology
> 550 Huntington Ave.
> Boston, MA 02115
> (617) 442-2384 (FLO main number)
>



-- 
Aditya Ramachandra Desai
MS Computer Science Graduate Student
USC Viterbi School of Engineering
Los Angeles, CA 90007
M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai


Regarding JSON indexing in SOLR 4.10

2016-03-29 Thread Aditya Desai
Hello everyone


I am running SOLR 4.10 on port 8984 by changing the default port in
etc/jetty.xml. I am now trying to index all my JSON files to Solr running
on 8984. The following is the command

curl 'http://localhost:8984/solr/update?commit=true' --data-binary *.json
-H 'Content-type:application/json'


I am getting the error as following

curl: (6) Could not resolve host:
00C3353DDF98B3096D4ADB96E158F0365095762B0E7FD3D0741E046B5CCA0383_Output.json
curl: (6) Could not resolve host:
00C3AAD1A19F00A8295662D612022D186A77C18CD14F5F007484C750CF8B108E_Output.json
curl: (6) Could not resolve host:
00C449E6FF6F69F07A8648F5DB115855133BFC592E70F45A639DD1AF4E52EC5B_Output.json
curl: (6) Could not resolve host:
00C6620B7783C6CE756474748B48F29C06F59474A126D83851753C5474B38A2C_Output.json
curl: (6) Could not resolve host:
00C70C0538BFEA03894F23A912E1ECBA2D7559E1F79B93380B922A99802AC764_Output.json

I am learning Apache Solr for the first time. Your help will be very much
appreciated.

Thanks in advance

Regards
-- 
Aditya Ramachandra Desai
MS Computer Science Graduate Student
USC Viterbi School of Engineering
Los Angeles, CA 90007
M : +1-415-463-9864 | L : https://www.linkedin.com/in/adityardesai


Re: Size of logs are high

2016-02-11 Thread Aditya Sundaram
Can you check your log level? Probably log level of error would suffice for
your purpose and it would most certainly reduce your log size(s).

On Thu, Feb 11, 2016 at 12:53 PM, kshitij tyagi  wrote:

> Hi,
> I have migrated to solr 5.2 and the size of logs are high.
>
> Can anyone help me out here how to control this?
>



-- 
Aditya Sundaram
Software Engineer, Technology team
AKR Tech park B Block, B1 047
+91-9844006866


Re: Document boost in Solr

2015-11-14 Thread Aditya
Hi

I am able to analyse the score using http://explain.solr.pl/

Score of 1st record:
100% 27.12627 sum of the following:
33.47% 9.078974 sum of the following:
19.34% 5.2460585 (MATCH) max of:
19.34% 5.2460585 PRODUCT_TITLE:machin^50.0
- 0.37926888 PRODUCT_CONTENT:machin^1.5
14.13% 3.8329153 (MATCH) max of:
14.13% 3.8329153 PRODUCT_TITLE:learn^50.0
- 0.28544438 PRODUCT_CONTENT:learn^1.5
66.53% 18.047297 (MATCH) max of:
66.53% 18.047297 PRODUCT_TITLE:"machin learn"~10^50.0
- 1.3227714 PRODUCT_CONTENT:"machin learn"~10^1.5

Score of 14th record. This supposed to come in less than 10.
100% 14.135922 sum of the following:
35.52% 5.0206614 sum of the following:
18.74% 2.6496599 (MATCH) max of:
18.74% 2.6496599 PRODUCT_TITLE:machin^50.0
- 0.22348635 PRODUCT_CONTENT:machin^1.5
16.77% 2.3710015 (MATCH) max of:
16.77% 2.3710015 PRODUCT_TITLE:learn^50.0
- 0.18167646 PRODUCT_CONTENT:learn^1.5
64.48% 9.115261 (MATCH) max of:
64.48% 9.115261 PRODUCT_TITLE:"machin learn"~10^50.0
- 0.7794506 PRODUCT_CONTENT:"machin learn"~10^1.5

How can I analyse whether the document boost is applied or not.

Regards
Aditya


On Sat, Nov 14, 2015 at 8:49 PM, Aditya 
wrote:

> I am not able to understand the debug information.
>
> Any specific parameter to look for?
>
> Regards
> Aditya
>
> On Sat, Nov 14, 2015 at 6:42 PM, Alexandre Rafalovitch  > wrote:
>
>> Did you try using debug.explain.other and seeing how it is ranked?
>> On 14 Nov 2015 6:28 am, "Aditya"  wrote:
>>
>> > Hi
>> >
>> > My website www.findbestopensource.com provides search over millions of
>> > open
>> > source projects.
>> >
>> > I recently found this issue in my website. Each project will have its
>> > description and rank and other set of fields. Rank is set as document
>> > boost, so that when user performs a search, high ranked projects should
>> > appear first.
>> >
>> > It was working fine with previous versions of Solr. Some time back I
>> moved
>> > to 4.10 and after that I am facing this issue. I added a high ranked
>> > project and when I did a search the project is not showing up in the
>> search
>> > results. It is showing the results which were added in older versions of
>> > Solr.
>> >
>> > I am using Solr 4.10  and using Solrj library.
>> >
>> > Regards
>> > Aditya
>> >
>>
>
>


Re: Document boost in Solr

2015-11-14 Thread Aditya
I am not able to understand the debug information.

Any specific parameter to look for?

Regards
Aditya

On Sat, Nov 14, 2015 at 6:42 PM, Alexandre Rafalovitch 
wrote:

> Did you try using debug.explain.other and seeing how it is ranked?
> On 14 Nov 2015 6:28 am, "Aditya"  wrote:
>
> > Hi
> >
> > My website www.findbestopensource.com provides search over millions of
> > open
> > source projects.
> >
> > I recently found this issue in my website. Each project will have its
> > description and rank and other set of fields. Rank is set as document
> > boost, so that when user performs a search, high ranked projects should
> > appear first.
> >
> > It was working fine with previous versions of Solr. Some time back I
> moved
> > to 4.10 and after that I am facing this issue. I added a high ranked
> > project and when I did a search the project is not showing up in the
> search
> > results. It is showing the results which were added in older versions of
> > Solr.
> >
> > I am using Solr 4.10  and using Solrj library.
> >
> > Regards
> > Aditya
> >
>


Document boost in Solr

2015-11-14 Thread Aditya
Hi

My website www.findbestopensource.com provides search over millions of open
source projects.

I recently found this issue in my website. Each project will have its
description and rank and other set of fields. Rank is set as document
boost, so that when user performs a search, high ranked projects should
appear first.

It was working fine with previous versions of Solr. Some time back I moved
to 4.10 and after that I am facing this issue. I added a high ranked
project and when I did a search the project is not showing up in the search
results. It is showing the results which were added in older versions of
Solr.

I am using Solr 4.10  and using Solrj library.

Regards
Aditya


How to:- Extending Tika within Solr

2015-07-23 Thread Aditya Dhulipala
Hi,


I have implemented a new file-type parser for TIka. It parses a custom
filetype (*.mx)


I would like my Solr instance to use my version of Tika with the mx parser.

I found this by a google search

https://lucidworks.com/blog/extending-apache-tika-capabilities/

But it seems to be over 5 years old. And the "download project" link is
broken


Can anybody help me with this?


I tried replaceing the tika-* jars within contrib/extraction/lib under
solr-root with my compiled tika-* jars. But that didn't work, Solr is still
using the old Tika binaries (i.e. without .mx parser). I know that my
tika-** jars are working correctly, because I can run them in GUI mode and
parse a test .mx file.



Thanks!

-

Aditya


IOException occured when talking to solr server

2014-12-22 Thread Aditya
Hello all

I am getting following error. Could anyone throw me some light on it. I am
accessing Solr via Solrj, when there is more load on the server i am
getting this error. Is there any way to overcome this situitation.

org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://localhost/solr
org.apache.solr.client.solrj.SolrServerException: Server refused connection
at: http://localhost/solr

Once this error is encountered, Tomcat is not responding and i need to
restart the server.

Regards
Aditya
www.findbestopensource.com


Re: Advise on an architecture with lot of cores

2014-10-09 Thread Aditya
Hi Manoj

There  are advantages in both the approach. I recently read an article,
http://lucidworks.com/blog/podcast-solr-at-scale-at-aol/ . AOL uses Solr
and it uses one core per user.

Having one core per customer helps you
1. Easily migrate / backup the index
2. Load the core as and when required. When user has signed in, load his
index otherwise you don't need to keep his data in memory.
3. Rebuilding data for particular user is easier

Cons:
1. If most of users are actively siging in and you need to load most of the
cores all the time then it will reduce the search time.
2. Each core will have some set of files and there could be situitation
where you will end up in too many files open exception. (We faced this
scenario).


Having single core for all
1. This reduces the headache of user specific stuff and sees the DB / index
as a black box, where you could query for all
2. When the load is more, shard it

Cons:
1. Rebuilding index will take more time

Regards
Aditya
www.findbestopensource.com






On Tue, Oct 7, 2014 at 8:01 PM, Manoj Bharadwaj 
wrote:

> Hi Toke,
>
> I don't think I answered your question properly.
>
> With the current 1 core/customer setup many cores are idle. The redesign we
> are working on will move most of our searches to being driven by SOLR vs
> database (current split is 90% database, 10% solr). With that change, all
> cores will see traffic.
>
> We have 25G data in the index (across all cores) and they are currently in
> a 2 core VM with 32G memory. We are making some changes to the schema and
> the analyzers and we see the index size growing by 25% or so due to this.
> And to support this we will be moving to a VM with 4 cores and 64G memory.
> Hardware as such isn't a constraint.
>
> Regards
> Manoj
>
> On Tue, Oct 7, 2014 at 8:47 AM, Toke Eskildsen 
> wrote:
>
> > On Tue, 2014-10-07 at 14:27 +0200, Manoj Bharadwaj wrote:
> > > My team inherited a SOLR setup with an architecture that has a core for
> > > every customer. We have a few different types of cores, say "A", "B",
> C",
> > > and for each one of this there is a core per customer - namely "A1",
> > > "A2"..., "B1", "B2"... Overall we have over 600 cores. We don't know
> the
> > > history behind the current design - the exact reasons why it was done
> the
> > > way it was done - one probable consideration was to ensure a customer
> > data
> > > separate from other.
> >
> > It is not a bad reason. It ensures that ranked search is optimized
> > towards each customer's data and makes it easy to manage adding and
> > removing customers.
> >
> > > We want to go to a single core per type architecture, and move on to
> > SOLR
> > > cloud as well in near future to achieve sharding via the features cloud
> > > provides.
> >
> > If the setup is heavy queried on most of the cores or is there are
> > core-spanning searches, collapsing the user-specific cores into fewer
> > super-cores might lower hardware requirements a bit. On the other hand,
> > it most of the cores are idle most of the time, the 1 core/customer
> > setup would be give better utilization of the hardware.
> >
> > Why do you want to collapse the cores?
> >
> > - Toke Eskildsen, State and University Library, Denmark
> >
> >
> >
>


Codec - PostingsFormat - Postings/TermsConsumer - Checkpointed merged segment.

2014-06-12 Thread Aditya Tripathi
pointed segment, any commit
will commit all the uncommitted segments without any flush requirement.
So for eg, if we use Solr's Optimize command, after doing a forceMerge()
everything is flushed and then a commit is issued. In this commit, the
custom FieldConsumer are not invoked and they do not get a chance to commit
any uncommitted in-memory information. So we end up in a problem with
Optimize command since the merged segment is now committed but our own
in-memory merged state is not committed.


Thanks in advance for reading this long question. Any thoughts are welcome.
If you are aware of some implementation doing partial updates through
custom codecs, please do let me know.

Kind Regards,
Aditya Tripathi.


Request for adding to Contributors Group

2014-03-31 Thread Aditya Choudhuri

Hello!

Please add my email and SolrWiki account in the ContributorsGroup.

My Wiki name = AdityaChoudhuri 
<https://wiki.apache.org/solr/AdityaChoudhuri>



Thank you.
Aditya





isolating solrcloud instance from peer updates

2013-09-21 Thread Aditya Sakhuja
Hello all,

Is there a way to isolate an active solr-cloud instance from all incoming
replication update requests from peer nodes ?

-- 
Regards,
-Aditya Sakhuja


Re: ReplicationFactor for solrcloud

2013-09-21 Thread Aditya Sakhuja
Thanks Shalin. We used the maxShardsPerNode=3 as you suggest here.


On Thu, Sep 12, 2013 at 4:09 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> You must specify maxShardsPerNode=3 for this to happen. By default
> maxShardsPerNode defaults to 1 so only one shard is created per node.
>
> On Thu, Sep 12, 2013 at 3:19 AM, Aditya Sakhuja
>  wrote:
> > Hi -
> >
> > I am trying to set the 3 shards and 3 replicas for my solrcloud
> deployment
> > with 3 servers, specifying the replicationFactor=3 and numShards=3 when
> > starting the first node. I see each of the servers allocated to 1 shard
> > each.however, do not see 3 replicas allocated on each node.
> >
> > I specifically need to have 3 replicas across 3 servers with 3 shards. Do
> > we think of any reason to not have this configuration ?
> >
> > --
> > Regards,
> > -Aditya Sakhuja
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
-Aditya Sakhuja


Re: solrcloud shards backup/restoration

2013-09-19 Thread Aditya Sakhuja
How does one recover from an index corruption ? That's what I am trying to
eventually tackle here.

Thanks
Aditya

On Thursday, September 19, 2013, Aditya Sakhuja wrote:

> Hi,
>
> Sorry for the late followup on this. Let me put in more details here.
>
> *The problem:*
>
> Cannot successfully restore back the index backed up with
> '/replication?command=backup'. The backup was generated as *
> snapshot.mmdd*
>
> *My setup and steps:*
> *
> *
> 6 solrcloud instances
> 7 zookeepers instances
>
> Steps:
>
> 1.> Take snapshot using *http://host1:8893/solr/replication?command=backup
> *, on one host only. move *snapshot.mmdd *to some reliable storage.
>
> 2.> Stop all 6 solr instances, all 7 zk instances.
>
> 3.> Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting
> the index data completely.
>
> 4.> Delete zookeeper/data/version*/* on all zookeeper nodes.
>
> 5.> Copy back index from backup to one of the nodes.
>  \> cp *snapshot.mmdd/*  *../collectionname/data/index/*
>
> 6.> Restart all zk instances. Restart all solrcloud instances.
>
>
> *Outcome:*
> *
> *
> All solr instances are up. However, *num of docs = 0 *for all nodes.
> Looking at the node where the index was restored, there is a new
> index.yymmddhhmmss directory being created and index.properties pointing to
> it. That explains why no documents are reported.
>
>
> How do I have solrcloud pickup data from the index directory on a restart
> ?
>
> Thanks in advance,
> Aditya
>
>
>
> On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja 
> wrote:
>
> Thanks Shalin and Mark for your responses. I am on the same page about the
> conventions for taking the backup. However, I am less sure about the
> restoration of the index. Lets say we have 3 shards across 3 solrcloud
> servers.
>
> 1.> I am assuming we should take a backup from each of the shard leaders
> to get a complete collection. do you think that will get the complete index
> ( not worrying about what is not hard committed at the time of backup ). ?
>
> 2.> How do we go about restoring the index in a fresh solrcloud cluster ?
> From the structure of the snapshot I took, I did not see any
> replication.properties or index.properties  which I see normally on a
> healthy solrcloud cluster nodes.
> if I have the snapshot named snapshot.20130905 does the
> snapshot.20130905/* go into data/index ?
>
> Thanks
> Aditya
>
>
>
> On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller  wrote:
>
> Phone typing. The end should not say "don't hard commit" - it should say
> "do a hard commit and take a snapshot".
>
> Mark
>
> Sent from my iPhone
>
> On Sep 6, 2013, at 7:26 AM, Mark Miller  wrote:
>
> > I don't know that it's too bad though - its always been the case that if
> you do a backup while indexing, it's just going to get up to the last hard
> commit. With SolrCloud that will still be the case. So just make sure you
> do a hard commit right before taking the backup - yes, it might miss a few
> docs in the tran log, but if you are taking a back up while indexing, you
> don't have great precision in any case - you will roughly get a snapshot
> for around that time - even without SolrCloud, if you are worried about
> precision and getting every update into that backup, you want to stop
> indexing and commit first. But if you just want a rough snapshot for around
> that time, in both cases you can still just don't hard commit and take a
> snapshot.
> >
> > Mark
> >
> > Sent from my iPhone
> >
> > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
> >
> >> The replication handler's backup command was built for pre-SolrCloud.
> >> It takes a snapshot of the index but it is unaware of the transaction
> >> log which is a key component in SolrCloud. Hence unless you stop
> >> updates, commit your changes and then take a backup, you will likely
> >> miss some updates.
> >>
> >> That being said, I'm curious to see how peer sync behaves when you try
> >> to restore from a snapshot. When you say that you haven't been
> >> successful in restoring, what exactly is the behaviour you observed?
> >>
> >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja <
> aditya.sakh...@gmail.com> wrote:
> >>> Hello,
> >>>
> >>> I was looking for a good backup / recovery solution for the solrcloud
> >>> indexes. I am more looking for restoring the indexes from the index
> >>> sn

Re: solrcloud shards backup/restoration

2013-09-19 Thread Aditya Sakhuja
Hi,

Sorry for the late followup on this. Let me put in more details here.

*The problem:*

Cannot successfully restore back the index backed up with
'/replication?command=backup'. The backup was generated as *
snapshot.mmdd*

*My setup and steps:*
*
*
6 solrcloud instances
7 zookeepers instances

Steps:

1.> Take snapshot using *http://host1:8893/solr/replication?command=backup*,
on one host only. move *snapshot.mmdd *to some reliable storage.

2.> Stop all 6 solr instances, all 7 zk instances.

3.> Delete ../collectionname/data/* on all solrcloud nodes. ie. deleting
the index data completely.

4.> Delete zookeeper/data/version*/* on all zookeeper nodes.

5.> Copy back index from backup to one of the nodes.
 \> cp *snapshot.mmdd/*  *../collectionname/data/index/*

6.> Restart all zk instances. Restart all solrcloud instances.


*Outcome:*
*
*
All solr instances are up. However, *num of docs = 0 *for all nodes.
Looking at the node where the index was restored, there is a new
index.yymmddhhmmss directory being created and index.properties pointing to
it. That explains why no documents are reported.


How do I have solrcloud pickup data from the index directory on a restart ?

Thanks in advance,
Aditya



On Fri, Sep 6, 2013 at 3:41 PM, Aditya Sakhuja wrote:

> Thanks Shalin and Mark for your responses. I am on the same page about the
> conventions for taking the backup. However, I am less sure about the
> restoration of the index. Lets say we have 3 shards across 3 solrcloud
> servers.
>
> 1.> I am assuming we should take a backup from each of the shard leaders
> to get a complete collection. do you think that will get the complete index
> ( not worrying about what is not hard committed at the time of backup ). ?
>
> 2.> How do we go about restoring the index in a fresh solrcloud cluster ?
> From the structure of the snapshot I took, I did not see any
> replication.properties or index.properties  which I see normally on a
> healthy solrcloud cluster nodes.
> if I have the snapshot named snapshot.20130905 does the
> snapshot.20130905/* go into data/index ?
>
> Thanks
> Aditya
>
>
>
> On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller  wrote:
>
>> Phone typing. The end should not say "don't hard commit" - it should say
>> "do a hard commit and take a snapshot".
>>
>> Mark
>>
>> Sent from my iPhone
>>
>> On Sep 6, 2013, at 7:26 AM, Mark Miller  wrote:
>>
>> > I don't know that it's too bad though - its always been the case that
>> if you do a backup while indexing, it's just going to get up to the last
>> hard commit. With SolrCloud that will still be the case. So just make sure
>> you do a hard commit right before taking the backup - yes, it might miss a
>> few docs in the tran log, but if you are taking a back up while indexing,
>> you don't have great precision in any case - you will roughly get a
>> snapshot for around that time - even without SolrCloud, if you are worried
>> about precision and getting every update into that backup, you want to stop
>> indexing and commit first. But if you just want a rough snapshot for around
>> that time, in both cases you can still just don't hard commit and take a
>> snapshot.
>> >
>> > Mark
>> >
>> > Sent from my iPhone
>> >
>> > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>> >
>> >> The replication handler's backup command was built for pre-SolrCloud.
>> >> It takes a snapshot of the index but it is unaware of the transaction
>> >> log which is a key component in SolrCloud. Hence unless you stop
>> >> updates, commit your changes and then take a backup, you will likely
>> >> miss some updates.
>> >>
>> >> That being said, I'm curious to see how peer sync behaves when you try
>> >> to restore from a snapshot. When you say that you haven't been
>> >> successful in restoring, what exactly is the behaviour you observed?
>> >>
>> >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja <
>> aditya.sakh...@gmail.com> wrote:
>> >>> Hello,
>> >>>
>> >>> I was looking for a good backup / recovery solution for the solrcloud
>> >>> indexes. I am more looking for restoring the indexes from the index
>> >>> snapshot, which can be taken using the replicationHandler's backup
>> command.
>> >>>
>> >>> I am looking for something that works with solrcloud 4.3 eventually,
>> but
>> >>> still relevant if you tested with a previous version.
>> >>>
>> >>> I haven't been successful in have the restored index replicate across
>> the
>> >>> new replicas, after I restart all the nodes, with one node having the
>> >>> restored index.
>> >>>
>> >>> Is restoring the indexes on all the nodes the best way to do it ?
>> >>> --
>> >>> Regards,
>> >>> -Aditya Sakhuja
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Shalin Shekhar Mangar.
>>
>
>
>
> --
> Regards,
> -Aditya Sakhuja
>



-- 
Regards,
-Aditya Sakhuja


ReplicationFactor for solrcloud

2013-09-11 Thread Aditya Sakhuja
Hi -

I am trying to set the 3 shards and 3 replicas for my solrcloud deployment
with 3 servers, specifying the replicationFactor=3 and numShards=3 when
starting the first node. I see each of the servers allocated to 1 shard
each.however, do not see 3 replicas allocated on each node.

I specifically need to have 3 replicas across 3 servers with 3 shards. Do
we think of any reason to not have this configuration ?

-- 
Regards,
-Aditya Sakhuja


Re: solrcloud shards backup/restoration

2013-09-06 Thread Aditya Sakhuja
Thanks Shalin and Mark for your responses. I am on the same page about the
conventions for taking the backup. However, I am less sure about the
restoration of the index. Lets say we have 3 shards across 3 solrcloud
servers.

1.> I am assuming we should take a backup from each of the shard leaders to
get a complete collection. do you think that will get the complete index (
not worrying about what is not hard committed at the time of backup ). ?

2.> How do we go about restoring the index in a fresh solrcloud cluster ?
>From the structure of the snapshot I took, I did not see any
replication.properties or index.properties  which I see normally on a
healthy solrcloud cluster nodes.
if I have the snapshot named snapshot.20130905 does the snapshot.20130905/*
go into data/index ?

Thanks
Aditya



On Fri, Sep 6, 2013 at 7:28 AM, Mark Miller  wrote:

> Phone typing. The end should not say "don't hard commit" - it should say
> "do a hard commit and take a snapshot".
>
> Mark
>
> Sent from my iPhone
>
> On Sep 6, 2013, at 7:26 AM, Mark Miller  wrote:
>
> > I don't know that it's too bad though - its always been the case that if
> you do a backup while indexing, it's just going to get up to the last hard
> commit. With SolrCloud that will still be the case. So just make sure you
> do a hard commit right before taking the backup - yes, it might miss a few
> docs in the tran log, but if you are taking a back up while indexing, you
> don't have great precision in any case - you will roughly get a snapshot
> for around that time - even without SolrCloud, if you are worried about
> precision and getting every update into that backup, you want to stop
> indexing and commit first. But if you just want a rough snapshot for around
> that time, in both cases you can still just don't hard commit and take a
> snapshot.
> >
> > Mark
> >
> > Sent from my iPhone
> >
> > On Sep 6, 2013, at 1:13 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
> >
> >> The replication handler's backup command was built for pre-SolrCloud.
> >> It takes a snapshot of the index but it is unaware of the transaction
> >> log which is a key component in SolrCloud. Hence unless you stop
> >> updates, commit your changes and then take a backup, you will likely
> >> miss some updates.
> >>
> >> That being said, I'm curious to see how peer sync behaves when you try
> >> to restore from a snapshot. When you say that you haven't been
> >> successful in restoring, what exactly is the behaviour you observed?
> >>
> >> On Fri, Sep 6, 2013 at 5:14 AM, Aditya Sakhuja <
> aditya.sakh...@gmail.com> wrote:
> >>> Hello,
> >>>
> >>> I was looking for a good backup / recovery solution for the solrcloud
> >>> indexes. I am more looking for restoring the indexes from the index
> >>> snapshot, which can be taken using the replicationHandler's backup
> command.
> >>>
> >>> I am looking for something that works with solrcloud 4.3 eventually,
> but
> >>> still relevant if you tested with a previous version.
> >>>
> >>> I haven't been successful in have the restored index replicate across
> the
> >>> new replicas, after I restart all the nodes, with one node having the
> >>> restored index.
> >>>
> >>> Is restoring the indexes on all the nodes the best way to do it ?
> >>> --
> >>> Regards,
> >>> -Aditya Sakhuja
> >>
> >>
> >>
> >> --
> >> Regards,
> >> Shalin Shekhar Mangar.
>



-- 
Regards,
-Aditya Sakhuja


data/index naming format

2013-09-05 Thread Aditya Sakhuja
Hello,

I am running solr 4.1 for now, and am confused about the structure and
naming of the contents of the data dir. I do not see the index.properties
being generated on a fresh solr node start either.

Can someone clarify when should one expect to see

data/index vs. data/index., and the index.properties along with
the second version.

-- 
Regards,
-Aditya Sakhuja


solrcloud shards backup/restoration

2013-09-05 Thread Aditya Sakhuja
Hello,

I was looking for a good backup / recovery solution for the solrcloud
indexes. I am more looking for restoring the indexes from the index
snapshot, which can be taken using the replicationHandler's backup command.

I am looking for something that works with solrcloud 4.3 eventually, but
still relevant if you tested with a previous version.

I haven't been successful in have the restored index replicate across the
new replicas, after I restart all the nodes, with one node having the
restored index.

Is restoring the indexes on all the nodes the best way to do it ?
-- 
Regards,
-Aditya Sakhuja


Solr 4.1 default commit mode

2013-08-01 Thread Aditya Sakhuja
Hi,

Can someone please confirm what is the default  "commit" type for solrcloud
4.1  As per
https://cwiki.apache.org/confluence/display/solr/UpdateHandlers+in+SolrConfig,
looks like softcommit is false ( which means every index update triggers an
IO ). Apparently this is application for future solrcloud 4.5.

I would appreciate if someone can confirm this for solr 4.1 ?

My second question is : Is it ok to have different commit types on
different nodes which are part of my solrCloud deployment ?

Regards,
Aditya


-- 
Regards,
-Aditya Sakhuja


Re: Solr Cloud - How to balance Batch and Queue indexing?

2013-07-30 Thread Aditya
Hi,

Do you want 5 replicas? 1 or 2 is enough.

If you already have 100 million records, you don't need to do batch
indexing. Push it once, Solr has the capability to soft commit every N docs.

Use round robin and send documents to different core. When you search,
search from all the cores.

How you want to setup your servers. Master & Slave OR Fail over. In case of
Master & Slave, Index documents in master and do search from replica cores.
In case of Fail over, your replica will be used once your main server is
failed.

Regards
Aditya
www.findbestopensource.com




On Tue, Jul 30, 2013 at 4:56 AM, SolrLover  wrote:

> I need some advice on the best way to implement Batch indexing with soft
> commit / Push indexing (via queue) with soft commit when using SolrCloud.
>
> *I am trying to figure out a way to:
> *
> 1. Make the push indexing available almost real time (using soft commit)
> without degrading the search / indexing performance.
> 2. Ability to not overwrite the existing document (based on listing_id, I
> assume I can use overwrite=false flag to disable overwrite).
> 3. Not block the push indexing when delta indexing happens (push indexing
> happens via UI, user should be able to search for the document pushed via
> UI
> almost instantaneously). Delta processing might take more time to complete
> indexing and I don't want the queue to wait until the batch processing is
> complete.
> 4. Copy the updated collection for backup.
>
> *More information on setup:
> *We have 100 million records (around 6 stored fields / 12 indexed fields).
> We are planning to have 5 cores (each with 20 million documents) with 5
> replicas.
> We will be always doing delta batch indexing.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-Cloud-How-to-balance-Batch-and-Queue-indexing-tp4081169.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: processing documents in solr

2013-07-28 Thread Aditya
Hi,

The easiest solution would be to have timestamp indexed. Is there any issue
in doing re-indexing?
If you want to process records in batch then you need a ordered list and a
bookmark. You require a field to sort and maintain a counter / last id as
bookmark. This is mandatory to solve your problem.

If you don't want to re-index, then you need to maintain information
related to visited nodes. Have a database / solr core which maintains list
of IDs which already processed. Fetch record from Solr, For each record,
check the new DB, if the record is already processed.

Regards
Aditya
www.findbestopensource.com





On Mon, Jul 29, 2013 at 10:26 AM, Joe Zhang  wrote:

> Basically, I was thinking about running a range query like Shawn suggested
> on the tstamp field, but unfortunately it was not indexed. Range queries
> only work on indexed fields, right?
>
>
> On Sun, Jul 28, 2013 at 9:49 PM, Joe Zhang  wrote:
>
> > I've been thinking about tstamp solution int the past few days. but too
> > bad, the field is avaialble but not indexed...
> >
> > I'm not familiar with SolrJ. Again, sounds like SolrJ is providing the
> > counter value. If yes, that would be equivalent to an autoincrement id.
> I'm
> > indexing from Nutch though; don't know how to feed in such counter...
> >
> >
> > On Sun, Jul 28, 2013 at 7:03 AM, Erick Erickson  >wrote:
> >
> >> Why wouldn't a simple timestamp work for the ordering? Although
> >> I guess "simple timestamp" isn't really simple if the time settings
> >> change.
> >>
> >> So how about a simple counter field in your documents? Assuming
> >> you're indexing from SolrJ, your setup is to query q=*:*&sort=counter
> >> desc.
> >> Take the counter from the first document returned. Increment for
> >> each doc for the life of the indexing run. Now you've got, for all
> intents
> >> and purposes, an identity field albeit manually maintained.
> >>
> >> Then use your counter field as Shawn suggests for pulling all the
> >> data out.
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Sun, Jul 28, 2013 at 1:01 AM, Maurizio Cucchiara
> >>  wrote:
> >> > In both cases, for better performance, first I'd load just all the
> IDs,
> >> > after, during processing I'd load each document.
> >> > For what concern the incremental requirement, it should not be
> >> difficult to
> >> > write an hash function which maps a non-numerical I'd to a value.
> >> >  On Jul 27, 2013 7:03 AM, "Joe Zhang"  wrote:
> >> >
> >> >> Dear list:
> >> >>
> >> >> I have an ever-growing solr repository, and I need to process every
> >> single
> >> >> document to extract statistics. What would be a reasonable process
> that
> >> >> satifies the following properties:
> >> >>
> >> >> - Exhaustive: I have to traverse every single document
> >> >> - Incremental: in other words, it has to allow me to divide and
> >> conquer ---
> >> >> if I have processed the first 20k docs, next time I can start with
> >> 20001.
> >> >>
> >> >> A simple "*:*" query would satisfy the 1st but not the 2nd property.
> In
> >> >> fact, given that the processing will take very long, and the
> repository
> >> >> keeps growing, it is not even clear that the exhaustiveness is
> >> achieved.
> >> >>
> >> >> I'm running solr 3.6.2 in a single-machine setting; no hadoop
> >> capability
> >> >> yet. But I guess the same issues still hold even if I have the solr
> >> cloud
> >> >> environment, right, say in each shard?
> >> >>
> >> >> Any help would be greatly appreciated.
> >> >>
> >> >> Joe
> >> >>
> >>
> >
> >
>


Re: Duplicate documents based on attribute

2013-07-25 Thread Aditya
You need to store the color field as multi valued stored field. You have to
do pagination manually. If you worried, then use database. Have a table
with Product Name and Color. You could retrieve data with pagination.

Still if you want to achieve it via Solr. Have a separate record for every
product and color. ProductName, Color, RecordType. Since Solr is NoSQL, you
could have different fields and not all records should have all the fields.
You could store different type of document. Filter the record by its type.

Regards
Aditya
www.findbestopensource.com






On Thu, Jul 25, 2013 at 11:01 PM, Alexandre Rafalovitch
wrote:

> Look for the presentations online. You are not the first store to use Solr,
> there are some explanations around. Try one from Gilt, but I think there
> were more.
>
> You will want to store data at the lowest meaningful level of search
> granularity. So, in your case, it might be ProductVariation (shoes+color).
> Some examples I have seen, even store it down to availability level or
> price-difference level. Then, you do some post-search normalization either
> by doing groups or by doing filtering.
>
> Solr is not a database, store what you want to find.
>
> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Thu, Jul 25, 2013 at 12:42 PM, Mark  wrote:
>
> > How would I go about doing something like this. Not sure if this is
> > something that can be accomplished on the index side or its something
> that
> > should be done in our application.
> >
> > Say we are an online store for shoes and we are selling Product A in red,
> > blue and green. Is there a way when we search for Product A all three
> > results can be returned even though they are logically the same item
> (same
> > product in our database).
> >
> > Thoughts on how this can be accomplished?
> >
> > Thanks
> >
> > - M
>


Re: Auto Indexing in Solr

2013-07-25 Thread Aditya
Hi

You could use Java timer. Trigger your DB import, every X minute. Another
option, You may aware when your DB is updated. When ever DB gets changed,
trigger the request to index the new added data.

Regards
Aditya
www.findbestopensource.com



On Thu, Jul 25, 2013 at 11:42 AM, archit2112  wrote:

> Hi Im using Solr 4's Data Import Utility to index Oracle 10g XE database.
> Im
> using full imports as well as delta imports. I want these processes to be
> automatic. (Eg: The import processes can be timed or should be executed as
> soon any data in the database is modified). I searched for the same online
> and I heard people talk about CRON and scripts. However, Im not able to
> figure out how to implement it. Can you please provide a tutorial like
> explanation? Thanks in advance
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Auto-Indexing-in-Solr-tp4080233.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: and performance

2013-07-18 Thread Aditya
Hi

It totally depends upon your affordability. If you could afford go for
bigger RAM, SSD drive and 64 Bit OS.

Benchmark your application, with certain set of docs, how much RAM it
takes, Indexing time, Search time etc. Increase the document count and
perform benchmarking tasks again. This will provide more information.
Everything is directly proportional to number of docs.

In my case, I have basic hosting plan and i am happy with the performance.
My point is you don't always need fancy hardware. Start with basic and
based on the need you could change the plan.

Regards
Aditya
www.findbestopensource.com





On Wed, Jul 17, 2013 at 4:55 PM, Ayman Plaha  wrote:

> Thanks Aditya, can I also please get some advice on hosting.
>
>- What *hosting specs* should I get ? How much RAM ? Considering my
>- client application is very simple that just register users to database
>and queries SOLR and displays SOLR results.
>- simple batch program adds the 1000 OR 2000 documents to SOLR every
>second.
>
> I'm hoping to deploy the code next week, if you guys can give me any other
> advice I'd really appreciate that.
>
>
> On Wed, Jul 17, 2013 at 7:07 PM, Aditya  >wrote:
>
> > Hi
> >
> > It will not affect the performance. We are doing this  regularly. If you
> do
> > optimize and search then there may be some impact.
> >
> > Regards
> > Aditya
> > www.findbestopensource.com
> >
> >
> >
> > On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha 
> > wrote:
> >
> > > Hey Guys,
> > >
> > > I've finally finished my Spring Java application that uses SOLR for
> > > searches and just had performance related question about SOLR. I'm
> > indexing
> > > exactly 1000 *OR* 2000 records every second. Every record having 13
> > fields
> > > including 'id'. Majority of the fields are solr.StrField (no filters)
> > with
> > > characters ranging from 5 - 50 in length and one field which is text_t
> > > (solr.TextField) which can be of length 100 characters to 2000
> characters
> > > and has the following tokenizer and filters
> > >
> > >- PatternTokenizerFactory
> > >- LowerCaseFilterFactory
> > >- SynonymFilterFactory
> > >- SnowballPorterFilterFactory.
> > >
> > >
> > > I'm not using shards. I was hoping when searches get slow I will
> consider
> > > this or should I consider this now ?
> > >
> > > *Questions:*
> > >
> > >- I'm using SOLR autoCommit (every 15 minutes) with openSearcher set
> > as
> > >true. I'm not using autoSoftCommit because instant availability of
> the
> > >documents for search is not necessary and I don't want to chew up
> too
> > > much
> > >memory because I'm consider Cloud hosting.
> > >*
> > >**90
> > >**true
> > >**
> > >*will this effect the query performance of the client website if the
> > >index grew to 10 million records ? I mean while the commit is
> > happening
> > >does that *effect the performance of queries* and how will this
> effect
> > >the queries if the index grew to 10 million records ?
> > >- What *hosting specs* should I get ? How much RAM ? Considering my
> > >- client application is very simple that just register users to
> > database
> > >and queries SOLR and displays SOLR results.
> > >- simple batch program adds the 1000 OR 2000 documents to SOLR every
> > >second.
> > >
> > >
> > > I'm hoping to deploy the code next week, if you guys can give me any
> > other
> > > advice I'd really appreciate that.
> > >
> > > Thanks
> > > Ayman
> > >
> >
>


Re: and performance

2013-07-17 Thread Aditya
Hi

It will not affect the performance. We are doing this  regularly. If you do
optimize and search then there may be some impact.

Regards
Aditya
www.findbestopensource.com



On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha  wrote:

> Hey Guys,
>
> I've finally finished my Spring Java application that uses SOLR for
> searches and just had performance related question about SOLR. I'm indexing
> exactly 1000 *OR* 2000 records every second. Every record having 13 fields
> including 'id'. Majority of the fields are solr.StrField (no filters) with
> characters ranging from 5 - 50 in length and one field which is text_t
> (solr.TextField) which can be of length 100 characters to 2000 characters
> and has the following tokenizer and filters
>
>- PatternTokenizerFactory
>- LowerCaseFilterFactory
>- SynonymFilterFactory
>- SnowballPorterFilterFactory.
>
>
> I'm not using shards. I was hoping when searches get slow I will consider
> this or should I consider this now ?
>
> *Questions:*
>
>- I'm using SOLR autoCommit (every 15 minutes) with openSearcher set as
>true. I'm not using autoSoftCommit because instant availability of the
>documents for search is not necessary and I don't want to chew up too
> much
>memory because I'm consider Cloud hosting.
>*
>**90
>**true
>**
>*will this effect the query performance of the client website if the
>index grew to 10 million records ? I mean while the commit is happening
>does that *effect the performance of queries* and how will this effect
>the queries if the index grew to 10 million records ?
>- What *hosting specs* should I get ? How much RAM ? Considering my
>- client application is very simple that just register users to database
>and queries SOLR and displays SOLR results.
>- simple batch program adds the 1000 OR 2000 documents to SOLR every
>second.
>
>
> I'm hoping to deploy the code next week, if you guys can give me any other
> advice I'd really appreciate that.
>
> Thanks
> Ayman
>


Re: SOLR guidance required

2013-05-10 Thread Aditya
Hi Kamal,

It is feasible and that is the correct approach. Add additional fields like
salary, experience etc to the index and filter the results. This way you
could directly show the results to the user.

It is always better to avoid two searches one in solr and other in db. You
should maintain search fields in Solr and filter results from Solr. DB is
used to maintain extended fields and may be whole document (resume).

Refer to range query search http://wiki.apache.org/solr/SolrQuerySyntax

Regards
Aditya
www.findbestopensource.com


On Fri, May 10, 2013 at 9:11 AM, Kamal Palei  wrote:

> Dear SOLR experts
> I might be asking a very silly question. As I am new to SOLR kindly guide
> me.
>
>
> I have a job site. Using SOLR to search resumes. When a HR user enters some
> keywords say JAVA, MySQL etc, I search resume documents using SOLR,
> retrieve 100 records and show to user.
>
> The problem I face is say, I retrieved 100 records, then we do filtering
> for experience range, age range, salary range (using mysql query).
> Sometimes it so happens that the 100 records I fetch , I do not get a
> single record to show to user. When user clicks next link there might be
> few records, it looks odd really.
>
>
> I hope there must be some mechanism, by which I can associate salary,
> experience, age etc with resume document during indexing. And when
> I search for resumes I can give all filters accordingly and can retrieve
> 100 records and strait way I can show 100 records to user without doing any
> mysql query. Please let me know if this is feasible. If so, kindly give me
> some pointer how do I do it.
>
> Best Regards
> Kamal
>


Re: Group.query

2012-09-26 Thread Aditya
Hi

You are doing AND search, so you are getting results prod1 and prod2. I
guess, you should query only for group1 and another query for group2.

Regards
Aditya
www.findbestopensource.com



On Wed, Sep 26, 2012 at 12:26 PM, Peter Kirk  wrote:

> Hi
>
> I have "products" which belong to one or more "groups".
> Products are documents in Solr, while the groups are fields (eg.
> group_1_bool:true).
>
> For example:
>
> Prod1 => group1, group2
> Prod2 => group1, group2
> Prod3 => group1
> Prod4 => group2
>
> I would like to execute a query which results in the groups with their
> products. That is, the result should be something like:
>
> Group1 => Prod1, Prod2, Prod3
> Group2 => Prod1, Prod2, Prod4
>
> How can I do this?
>
> I've been looking at group.query, but I don't think this is what I want.
>
> For example, "q=*:*&group.query=group_1_bool:true+AND+group_2_bool:true"
> Results in 1 group called "group_1_bool:true AND group_2_bool:true", which
> contains prod1 and prod2.
>
>
> Thanks,
> Peter
>
>


Re: Is there any relationship between size of index and indexing performance?

2012-05-28 Thread Aditya
Hi Ivan,

It depends on number of terms it has to load. If you index less amount of
data but store large amount of data then your index size may be big but
actual terms may be less.

It is not directly proportional.

Regards
Aditya
www.findbestopensource.com


On Mon, May 28, 2012 at 3:00 PM, Ivan Hrytsyuk
wrote:

> Let's assume we are indexing 1GB of data. Does size of index have any
> impact on indexing performance? I.e. will we have any difference in case of
> empty index vs 50 GB index?
>
> Thank you, Ivan
>


Re: Strategy for maintaining De-normalized indexes

2012-05-23 Thread Aditya
Hi Sohail,

In my previous mail, I mentioned about storing categories as separate
record. You should store and index Category name, MainProduct name as
separate record. Index ChildProduct name, MainProduct as separate record.

When you want the count,
1. Retrieve the main product name matching the category
2. Retrieve the list of child products matching the main product

You may need to two query but it is worth. You don't need to delete and
bunch of records.

Regards
Aditya
www.findbestopensource.com


On Tue, May 22, 2012 at 5:12 PM, Sohail Aboobaker wrote:

> We are still in design phase, so we haven't hit any performance issues. We
> do not want to discover performance issues too late during QA :) We would
> rather account for any issues during the design phase.
>
> The refresh rate on fields that we are using from master table will be
> rare. May be three or four times in a year.
>
> Regards,
> Sohail
>