Re: Queries not supported by Lucene Query Parser syntax

2015-01-01 Thread David Philip
Hi Leonid,

   Have you had a look at edismax query parser[1]? Isn't that any use to
your requirement? I am not sure whether it is something that you are
looking for. But the question seemed to be having a query related to that.


[1] http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax



On Thu, Jan 1, 2015 at 2:38 PM, Leonid Bolshinsky 
wrote:

> Hello,
>
> Are we always limited by the query parser syntax when passing a query
> string to Solr?
> What about the query elements which are not supported by the syntax?
> For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by
> BooleanQuery.toString() into ~n. But this is not a valid query syntax. So
> how can we express this via query syntax in Solr?
>
> And more general question:
> Given a Lucene Query object which was built programatically by a legacy
> code (which is using Lucene and not Solr), is there any way to translate it
> into Solr query (which must be a string). As Query.toString() doesn't have
> to be a valid Lucene query syntax, does it mean that the Solr query string
> must to be manually translated from the Lucene query object? Is there any
> utility that performs this job? And, again, what about queries not
> supported by the query syntax, like CustomScoreQuery, PayloadTermQuery
> etc.? Are we always limited in Solr by the query parser syntax?
>
> Thanks,
> Leonid
>


Re: unable to upload the solr configuration to zookeeper

2014-12-31 Thread David Philip
Hi Aman,

   This error could be because the solr instance is looking for the
dependent logger jars. You should copy the jar files from solr download (
solr/example/lib/ext) to tomcat lib[1].


[1]
https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty






On Wed, Dec 31, 2014 at 7:21 PM, Aman Tandon 
wrote:

> Hi,
>
> I am trying to configure the solrcloud, I followed the myjeeva
> tutorial (myjeeva
> link
> <
> http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html#zookeeper-ensemble-depolyment
> >)
> because i want to configure it with the tomcat. All 5 zookeeper server are
> running fine.
>
> Now when i tries to upload the solr configuration files it is giving me
> error. I am using solr 4.10.2 (Operating System: Linux Mint) . Please help.
>
> *command: *java -classpath .:/home/aman/solr_cloud/solr-cli-lib/*
> > org.apache.solr.cloud.ZkCLI HELP -cmd upconfig -zkhost
> >
> localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2185
> > -confdir /home/aman/solr_cloud/config-files -confname myconf
> >
>
>
>
> > log4j:WARN No appenders could be found for logger
> > (org.apache.zookeeper.ZooKeeper).
> > log4j:WARN Please initialize the log4j system properly.
> > log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
> > more info.
> > -confdir and -confname are required for upconfig
>
>
> With Regards
> Aman Tandon
>


Re: Upgrading Solr from 1.4.1 to 4.10

2014-11-28 Thread David Philip
Hi Raja,

  Could you please mention the list of solr features that you were/are
using in Solr 1.4. There have been tremendous changes since 1.4 to 4.10.
Also, you may have to explore solr cloud for resolving the indexing
operation. But what kind of indexing problems are you facing?

You should look into the link mentioned below. The best way to upgrade from
such older version to latest is to configure the features that you were
using in Solr 1.4 into solr 4.10, run test cases and start using it.

Thanks - David.

https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr











On Fri, Nov 28, 2014 at 3:14 PM, 
wrote:

> Hi Team,
>
> We are using Apache Solr 1.4.1 for our project. Now a days we are facing
> many problems regarding solr indexing, so when we saw website we found
> latest version is 4.10, could you please help us in Upgrading the Solr.
>
> Is there any specific things which we need to change from our current setup
>
> Regards,
> Raja
> +91-8121704967
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipient(s) and may contain confidential and privileged
> information. If you are not the intended recipient(s), please reply to the
> sender and destroy all copies of the original message. Any unauthorized
> review, use, disclosure, dissemination, forwarding, printing or copying of
> this email, and/or any action taken in reliance on the contents of this
> e-mail is strictly prohibited and may be unlawful. Where permitted by
> applicable law, this e-mail and other e-mail communications sent to and
> from Cognizant e-mail addresses may be monitored.
>


Edismax Phrase Search

2014-11-13 Thread David Philip
Hi All,

   How to do a phrase search and then term proximity search using edismax
query parser?
For ex: If the search term is "red apples", the products having "red
apples" in their fields should be returned first and then products having
red apples with term proximity of n.

Thanks.
David


Clear Solr Admin Interface Logging page's logs

2014-10-29 Thread David Philip
Hi,

 Is there a way to clear the solr admin interface logging page's logs?

I understand that we can change the logging level but incase if I would
want to just clear the logs and say reload collection and expect to see
latest only and not the past?
Manual way or anywhere that I should clear so that I just see latest logs?


Thanks
David


Spell Check for Multiple Words

2014-10-24 Thread David Philip
Hi,

   I am trying to obtain multi-word spellcheck suggestions. For eg., I have
a title field with content as "Indefinite and fictitious large numbers" and
user searched for "larg numberr", in that case, I wanted to obtain "large
number" as suggestion from spell check suggestions. Could you please tell
me what should be the configuration to get this?

The field type is text_general [that which is defined in example schema.xml]


Thanks
 David.


Word Break Spell Checker Implementation algorithm

2014-10-20 Thread David Philip
Hi,

Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David


Re: Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
Sorry, analysis page clip is getting trimmed off and hence the indention is
lost.

Here it is :

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
care

expected:

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
makerz| *ride
care*



On Wed, Oct 15, 2014 at 7:21 PM, David Philip 
wrote:

> contd..
>
> expectation was that the "ride care"  should not have split into two
> tokens.
>
> It should have been as below. Please correct me/point me where I am wrong.
>
>
> Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
> care
>
> o/p
>
> ridemakersrideridemakerzrideridemarkridemakersmakerz
>
> *ride care*
>
>
>
>
> On Wed, Oct 15, 2014 at 7:16 PM, David Philip  > wrote:
>
>> Hi All,
>>
>>I remember using multi-words in synonyms in Solr 3.x version. In case
>> of multi words, I was escaping space with back slash[\] and it work as
>> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
>> each other and so when I searched for ride makers, I obtained the search
>> results for all of them. The field type was same as below. I have same set
>> up in solr 4.10 but now the multi word space escape is getting ignored. It
>> is tokenizing on spaces.
>>
>>  synonyms.txt
>> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Analysis page:
>>
>> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>>
>> Field Type
>>
>> > positionIncrementGap="100">
>>   
>> 
>> > ignoreCase="true" expand="true"/>
>>   
>> 
>>
>>
>>
>> Could you please tell me what could be the issue? How do I handle
>> multi-word cases?
>>
>>
>>
>>
>> synonyms.txt
>> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
>> care
>>
>>
>> Thanks - David
>>
>>
>>
>
>


Re: Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
contd..

expectation was that the "ride care"  should not have split into two tokens.

It should have been as below. Please correct me/point me where I am wrong.


Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
care

o/p

ridemakersrideridemakerzrideridemarkridemakersmakerz

*ride care*




On Wed, Oct 15, 2014 at 7:16 PM, David Philip 
wrote:

> Hi All,
>
>I remember using multi-words in synonyms in Solr 3.x version. In case
> of multi words, I was escaping space with back slash[\] and it work as
> intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
> each other and so when I searched for ride makers, I obtained the search
> results for all of them. The field type was same as below. I have same set
> up in solr 4.10 but now the multi word space escape is getting ignored. It
> is tokenizing on spaces.
>
>  synonyms.txt
> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Analysis page:
>
> ridemakersrideridemakerzrideridemarkridemakersmakerzcare
>
> Field Type
>
>  positionIncrementGap="100">
>   
> 
>  ignoreCase="true" expand="true"/>
>   
> 
>
>
>
> Could you please tell me what could be the issue? How do I handle
> multi-word cases?
>
>
>
>
> synonyms.txt
> ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care
>
>
> Thanks - David
>
>
>


Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
Hi All,

   I remember using multi-words in synonyms in Solr 3.x version. In case of
multi words, I was escaping space with back slash[\] and it work as
intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
each other and so when I searched for ride makers, I obtained the search
results for all of them. The field type was same as below. I have same set
up in solr 4.10 but now the multi word space escape is getting ignored. It
is tokenizing on spaces.

 synonyms.txt
ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Analysis page:

ridemakersrideridemakerzrideridemarkridemakersmakerzcare

Field Type


  


  




Could you please tell me what could be the issue? How do I handle
multi-word cases?




synonyms.txt
ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Thanks - David


SolrJ POJO Annotations

2014-07-24 Thread David Philip
Hi,

   This question is related to SolrJ document as a bean. I have an entity
that has another entity within it. Could you please tell me how to annotate
for inner entities? The issue I am facing is the inner entities fields are
missing while indexing. In the below example, It is just adding Content
fields and missing out author name and id.


Example:  "Content" is one class that has "Author" as its has-a
relationship entity.

class Content{

@Field("uniqueId")
String id;

@Field("timeStamp")
Long timeStamp;

//What should be the annotation type for this entity?
Author author;
}

class Author{
@Field("authorName")
String authorName;

@Field("authorId")
String id;

}


My schema xml is:







Thank you. - David


Re: retreive all the fields in join

2014-05-12 Thread David Philip
Hi Aman,

I think it is possible.

1. Use fl parameter.
2. Add all the 4 fields in both the schemas[schemas of core 1 and 2].
3. While querying use &fl=id,name,type,page.

It will return all the fields. The document that has no data for this
field, the field will be an empty string.
Ex:  {id:111,name: "abc", type:"", page""}
{page:17, type: "fiction", id:"", name:""}


Thanks







On Mon, May 12, 2014 at 7:10 AM, Aman Tandon wrote:

> please help me out here!!
>
> With Regards
> Aman Tandon
>
>
> On Sun, May 11, 2014 at 1:44 PM, Aman Tandon  >wrote:
>
> > Hi,
> >
> > Is there a way possible to retrieve all the fields present in both the
> > cores(core 1 and core2).
> >
> > e.g.
> > core1: {id:111,name: "abc" }
> >
> > core2: {page:17, type: "fiction"}
> >
> > I want is that, on querying both the cores I want to retrieve the results
> > containing all the 4 fields, fields id, name from core1 and page, type
> from
> > core2. Is it possible?
> >
> > With Regards
> > Aman Tandon
> >
>


Multi Lingual Analyzer

2014-01-20 Thread David Philip
Hi,



  I have a query on Multi-Lingual Analyser.


 Which one of the  below is the best approach?


1.1.To develop a translator that translates a/any language to
English and then use standard English analyzer to analyse – use translator,
both at index time and while search time?

2.  2.  To develop a language specific analyzer and use that by
creating specific field only for that language?

We have client data coming in different Languages: Kannada and Telegu and
others later.This data is basically the text written by customer in that
language.


Requirement is to develop analyzers particular for these language.



Thanks - David


Re: Store Solr OpenBitSets In Solr Indexes

2013-11-02 Thread David Philip
Oh fine. Caution point was useful for me.
Yes I wanted to do something similar to filer queries. It is not XY
problem. I am simply trying to implement  something as described below.

I have a [non-clinical] group sets in system and I want to build bitset
based on the documents belonging to that group and save it.
So that, While searching I want to retrieve similar bitset from Solr engine
for matched document and then execute logical XOR. [Am I clear with problem
explanation now?]


So what I am looking for is, If I have to retrieve bitset instance from
Solr search engine for the documents matched, how can I get it?
And How do I save bit mapping for the documents belonging to a particular
group. thus enable XOR operation.

Thanks - David










On Fri, Nov 1, 2013 at 5:05 PM, Erick Erickson wrote:

> Why are you saving this? Because if the bitset you're saving
> has anything to do with, say, filter queries, it's probably useless.
>
> The internal bitsets are often based on the internal Lucene doc ID,
> which will change when segment merges happen, thus the caution.
>
> Otherwise, theres the binary type you can probably use. It's not very
> efficient since I believe it uses base-64 encoding under the covers
> though...
>
> Is this an "XY" problem?
>
> Best,
> Erick
>
>
> On Wed, Oct 30, 2013 at 8:06 AM, David Philip
> wrote:
>
> > Hi All,
> >
> > What should be the field type if I have to save solr's open bit set value
> > within solr document object and retrieve it later for search?
> >
> >   OpenBitSet bits = new OpenBitSet();
> >
> >   bits.set(0);
> >   bits.set(1000);
> >
> >   doc.addField("SolrBitSets", bits);
> >
> >
> > What should be the field type of  SolrBitSets?
> >
> > Thanks
> >
>


Store Solr OpenBitSets In Solr Indexes

2013-10-30 Thread David Philip
Hi All,

What should be the field type if I have to save solr's open bit set value
within solr document object and retrieve it later for search?

  OpenBitSet bits = new OpenBitSet();

  bits.set(0);
  bits.set(1000);

  doc.addField("SolrBitSets", bits);


What should be the field type of  SolrBitSets?

Thanks


Re: Storing 2 dimension array in Solr

2013-10-14 Thread David Philip
Hi,

  I will check for pesudo join.

Jack,
I doubt further de-normalization. Rest of the points that you told me,  I
will take them. Thank you.
Basically, We have 2 different sor indexes. One table is rarely updated but
this group-disease table has frequent update and new dieasese are added
very often. So we maintain them separately. While querying we need join
operation on table 1 and 2.

Till now, I could create a test solr index with 100k dynamic field to each
document. Further, i am yet to test. it took almost 1.5 hours to create
index for 1500 groups * each group almost having 90k dynamic fields.

I also added doc_static field which copies all the integer set from copy
fields_disease to this field. While querying I use only this filed to
retrieve.
Any best approaches, please let me know.

Thanks - David






On Sun, Oct 13, 2013 at 6:37 PM, Jack Krupansky wrote:

> Yeah, something like that. The key or ID field would probably just be the
> composition of the group and disease fields.
>
> The other thing is if occurrence is simply a boolean, omit it and omit the
> document if that disease is not present for that group. If the majority of
> the diseases are not present for a specified group, that would eliminate a
> lot of documents. Or if occurrence is not a boolean, keep the field, but
> again not add a document if the disease is not present for that group.
>
> My usual, over-generalized rule for dynamic fields is that they are a
> powerful tool, but only if used in moderation. "Millions" would not be
> moderation.
>
> -- Jack Krupansky
>
> -Original Message- From: Lee Carroll
> Sent: Sunday, October 13, 2013 8:35 AM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Storing 2 dimension array in Solr
>
> I think he means a doc for each element. so you have a disease occurrence
> index
>
> 
> 1
> 1
> exist
> 1-1
> 
>
> assuming (and its a pretty fair assumption?) most groups have only a subset
> of diseases this will be a sparse matrix so just don't index
> the occurrence value "does not exist"
>
> basically denormalize via adding fields which don't relate to the key.
>
> This will work fine on modest hardware and no thought to performance for <5
> million docs. It will work fine with some though and hardware for very
> large numbers. Its worth a go anyway just to test. It should probably be
> your first method to try out.
>
>
>
>
> On 13 October 2013 12:10, Erick Erickson  wrote:
>
>  This sounds like a denormalization issue. Don't be afraid .
>>
>> Actually, I've seen from 50M 50 300M small docs on a Solr node,
>> depending on query type, hardware, etc. So that gives you a
>> place to start being cautious about the number of docs in your
>> system. If your full expansion of your table numbers in that range,
>> you might be just fine denormalizing the data.
>>
>> Alternatively, there's the "pseudo join" capability to consider. I'm
>> usually hesitant to recommend that, but Joel is committing some
>> really interesting stuff in the join area which you might take a look
>> at if the existing pseudo-join isn't performant enough.
>>
>> But I'd consider denormalizing the data as the first approach.
>>
>> Best,
>> Erick
>>
>>
>> On Sun, Oct 13, 2013 at 8:07 AM, David Philip
>> **wrote:
>>
>> > Hi Jack, for the point: "each element of the array as a solr document,
>> with
>> > a group field and a disease field"
>> > Did you mean it this way:
>> >
>> > 
>> >   "group1_grp": G1
>> >  "disease1_d": 2,
>> >  "disease2_d": 3,
>> > 
>> > 
>> >   "group1_grp": G2
>> >  "disease1_d": 2,
>> >  "disease2_d": 3,
>> > "disease3_d":  1,
>> > "disease4_d":  1,
>> > 
>> > similar to first case: having dynamic fields for disease?
>> > Will it be performance issue if disease field increase to millions?
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky <
>> j...@basetechnology.com
>> > >wrote:
>> >
>> > > You may be better off indexing each element of the array as a solr
>> > > document, with a group field and a disease field. Then you can easily
>> and
>> > > efficiently add new diseases. Then to query a row, you query for the
>> > group
>> > > field having the desire

Re: Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi Jack, for the point: "each element of the array as a solr document, with
a group field and a disease field"
Did you mean it this way:


  "group1_grp": G1
 "disease1_d": 2,
 "disease2_d": 3,


  "group1_grp": G2
 "disease1_d": 2,
 "disease2_d": 3,
"disease3_d":  1,
"disease4_d":  1,

similar to first case: having dynamic fields for disease?
Will it be performance issue if disease field increase to millions?











On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky wrote:

> You may be better off indexing each element of the array as a solr
> document, with a group field and a disease field. Then you can easily and
> efficiently add new diseases. Then to query a row, you query for the group
> field having the desired group.
>
> If possible, index the array as being sparse - no document for a disease
> if it is not present for that group.
>
> -- Jack Krupansky
>
> -Original Message- From: David Philip
> Sent: Saturday, October 12, 2013 9:56 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Storing 2 dimension array in Solr
>
>
> Hi Erick, Yes it is. But the columns here are dynamically and very
> frequently added.They can increase upto 1 million right now. So, 1 document
> with 1 million dynamic fields, is it fine? Or any other approach?
>
> While searching through web, I found that docValues are column oriented.
> http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/<http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/>
> However,  I did not understand, how to use docValues to add these columns.
>
> What is the recommended approach?
>
> Thanks - David
>
>
>
>
>
>
> On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson *
> *wrote:
>
>  Isn't this just indexing each row as a separate document
>> with a suitable ID "groupN" in your example?
>>
>>
>> On Sat, Oct 12, 2013 at 2:43 PM, David Philip
>> **wrote:
>>
>> > Hi Erick,
>> >
>> >We have set of groups as represented below. New columns (diseases as
>> in
>> > below matrix) keep coming and we need to add them as new column. To that
>> > column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
>> > notfound) for respective groups.
>> >
>> > While querying we need  to get the entire row for group:"group1".  We
>> will
>> > not be searching on columns(*_disease) values, index=false but stored is
>> > true.
>> >
>> > for ex: we use, get group:"group1" and we need to get the entire row-
>> > exist,slight, not found. Hoping this explanation is clearer.
>> >
>> >disease1disease2 disease3
>> > group1exist slight  not found
>> > groups2   slightnot foundexist
>> > group3slight exist
>> > groupK-na exist
>> >
>> >
>> >
>> > Thanks - David
>> >
>> >
>> >
>> >
>> >
>> > On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson <
>> erickerick...@gmail.com
>> > >wrote:
>> >
>> > > David:
>> > >
>> > > This feels like it may be an XY problem. _Why_ do you
>> > > want to store a 2-dimensional array and what
>> > > do you want to do with it? Maybe there are better
>> > > approaches.
>> > >
>> > > Best
>> > > Erick
>> > >
>> > >
>> > > On Sat, Oct 12, 2013 at 2:07 AM, David Philip
>> > > **wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > >   I have a 2 dimension array and want it to be persisted in solr. >
>> > > How
>> > > can I
>> > > > do that?
>> > > >
>> > > > Sample case:
>> > > >
>> > > >  disease1disease2 disease3
>> > > > group1exist slight  not found
>> > > > groups2   slightnot foundexist
>> > > > group2slight exist
>> > > >
>> > > > exist-1 not found - 2 slight-3 .. can be stored like this also.
>> > > >
>> > > > Note: This array has frequent updates.  Every time new disease get's
>> > > added
>> > > > and I have to add description about that disease to all groups. And
>> at
>> > > > query time, I will do get by row  - get by group only group = group2
>> > row.
>> > > >
>> > > > Any suggestion on how I can achieve this?  I am thankful to the > >
>> > forum
>> > for
>> > > > replying with patience, on achieving this, i will blog and will > >
>> > share
>> > it
>> > > > with all.
>> > > >
>> > > > Thanks - David
>> > > >
>> > >
>> >
>>
>>
>


Re: Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi Erick, Yes it is. But the columns here are dynamically and very
frequently added.They can increase upto 1 million right now. So, 1 document
with 1 million dynamic fields, is it fine? Or any other approach?

While searching through web, I found that docValues are column oriented.
http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
However,  I did not understand, how to use docValues to add these columns.

What is the recommended approach?

Thanks - David






On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson wrote:

> Isn't this just indexing each row as a separate document
> with a suitable ID "groupN" in your example?
>
>
> On Sat, Oct 12, 2013 at 2:43 PM, David Philip
> wrote:
>
> > Hi Erick,
> >
> >We have set of groups as represented below. New columns (diseases as
> in
> > below matrix) keep coming and we need to add them as new column. To that
> > column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
> > notfound) for respective groups.
> >
> > While querying we need  to get the entire row for group:"group1".  We
> will
> > not be searching on columns(*_disease) values, index=false but stored is
> > true.
> >
> > for ex: we use, get group:"group1" and we need to get the entire row-
> > exist,slight, not found. Hoping this explanation is clearer.
> >
> >disease1disease2 disease3
> > group1exist slight  not found
> > groups2   slightnot foundexist
> > group3slight exist
> > groupK-na exist
> >
> >
> >
> > Thanks - David
> >
> >
> >
> >
> >
> > On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson <
> erickerick...@gmail.com
> > >wrote:
> >
> > > David:
> > >
> > > This feels like it may be an XY problem. _Why_ do you
> > > want to store a 2-dimensional array and what
> > > do you want to do with it? Maybe there are better
> > > approaches.
> > >
> > > Best
> > > Erick
> > >
> > >
> > > On Sat, Oct 12, 2013 at 2:07 AM, David Philip
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > >   I have a 2 dimension array and want it to be persisted in solr. How
> > > can I
> > > > do that?
> > > >
> > > > Sample case:
> > > >
> > > >  disease1disease2 disease3
> > > > group1exist slight  not found
> > > > groups2   slightnot foundexist
> > > > group2slight exist
> > > >
> > > > exist-1 not found - 2 slight-3 .. can be stored like this also.
> > > >
> > > > Note: This array has frequent updates.  Every time new disease get's
> > > added
> > > > and I have to add description about that disease to all groups. And
> at
> > > > query time, I will do get by row  - get by group only group = group2
> > row.
> > > >
> > > > Any suggestion on how I can achieve this?  I am thankful to the forum
> > for
> > > > replying with patience, on achieving this, i will blog and will share
> > it
> > > > with all.
> > > >
> > > > Thanks - David
> > > >
> > >
> >
>


Re: Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi Erick,

   We have set of groups as represented below. New columns (diseases as in
below matrix) keep coming and we need to add them as new column. To that
column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
notfound) for respective groups.

While querying we need  to get the entire row for group:"group1".  We will
not be searching on columns(*_disease) values, index=false but stored is
true.

for ex: we use, get group:"group1" and we need to get the entire row-
exist,slight, not found. Hoping this explanation is clearer.

   disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group3slight exist
groupK-na exist



Thanks - David





On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson wrote:

> David:
>
> This feels like it may be an XY problem. _Why_ do you
> want to store a 2-dimensional array and what
> do you want to do with it? Maybe there are better
> approaches.
>
> Best
> Erick
>
>
> On Sat, Oct 12, 2013 at 2:07 AM, David Philip
> wrote:
>
> > Hi,
> >
> >   I have a 2 dimension array and want it to be persisted in solr. How
> can I
> > do that?
> >
> > Sample case:
> >
> >  disease1disease2 disease3
> > group1exist slight  not found
> > groups2   slightnot foundexist
> > group2slight exist
> >
> > exist-1 not found - 2 slight-3 .. can be stored like this also.
> >
> > Note: This array has frequent updates.  Every time new disease get's
> added
> > and I have to add description about that disease to all groups. And at
> > query time, I will do get by row  - get by group only group = group2 row.
> >
> > Any suggestion on how I can achieve this?  I am thankful to the forum for
> > replying with patience, on achieving this, i will blog and will share it
> > with all.
> >
> > Thanks - David
> >
>


Storing 2 dimension array in Solr

2013-10-11 Thread David Philip
Hi,

  I have a 2 dimension array and want it to be persisted in solr. How can I
do that?

Sample case:

 disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group2slight exist

exist-1 not found - 2 slight-3 .. can be stored like this also.

Note: This array has frequent updates.  Every time new disease get's added
and I have to add description about that disease to all groups. And at
query time, I will do get by row  - get by group only group = group2 row.

Any suggestion on how I can achieve this?  I am thankful to the forum for
replying with patience, on achieving this, i will blog and will share it
with all.

Thanks - David


Re: Solr's Filtering approaches

2013-10-11 Thread David Philip
Groups are pharmaceutical research expts.. User is presented with graph
view, he can select some region and all the groups in that region gets
included..user can modify the groups also here.. so we didn't maintain
group information in same solr index but we have externalized.
I looked at post filter article. So my understanding is that, I simply have
to extended as you did and should include implementaton for
"isAllowed(acls[doc], groups)" .This will filter the documents in the
collector and finally this collector will be returned. am I right?

  @Override
  public void collect(int doc) throws IOException {
if (isAllowed(acls[doc], user, groups)) super.collect(doc);
  }


Erick, I am interested to know whether I can extend any class that can
return me only the bitset of the documents that match the search query. I
can then do bitset1.andbitset2OfGroups - finally, collect only those
documents to return to user. How do I try this approach? Any pointers for
bit set?

Thanks - David




On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson wrote:

> Well, my first question is why 50K groups is necessary, and
> whether you can simplify that. How a user can manually
> choose from among that many groups is "interesting". But
> assuming they're all necessary, I can think of two things.
>
> If the user can only select ranges, just put in filter queries
> using ranges. Or possibly both ranges and individual entries,
> as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc.
> You need to be a little careful how you put index these so
> range queries work properly, in the above you'd miss
> 2A because it's sorting lexicographically, you'd need to
> store in some form that sorts like 001A 01A
> and so on. You wouldn't need to show that form to the
> user, just form your fq's in the app to work with
> that form.
>
> If that won't work (you wouldn't want this to get huge), think
> about a "post filter" that would only operate on documents that
> had made it through the select, although how to convey which
> groups the user selected to the post filter is an open
> question.
>
> Best,
> Erick
>
> On Wed, Oct 9, 2013 at 12:23 PM, David Philip
>  wrote:
> > Hi All,
> >
> > I have an issue in handling filters for one of our requirements and
> > liked to get suggestion  for the best approaches.
> >
> >
> > *Use Case:*
> >
> > 1.  We have List of groups and the number of groups can increase upto >1
> > million. Currently we have almost 90 thousand groups in the solr search
> > system.
> >
> > 2.  Just before the user hits a search, He has options to select the no.
> of
> >  groups he want to retrieve. [the distinct list of these group Names for
> > display are retrieved from other solr index that has more information
> about
> > groups]
> >
> > *3.User Operation:** *
> > Say if user selected group 1A  - group 1A.  and searches for
> key:cancer.
> >
> >
> > The current approach I was thinking is : get search results and filter
> > query by groupids' list selected by user. But my concern is When these
> > groups list is increasing to >50k unique Ids, This can cause lot of delay
> > in getting search results. So wanted to know whether there are different
> >  filtering ways that I can try for?
> >
> > I was thinking of one more approach as suggested by my colleague to do -
> >  intersection.  -
> > Get the groupIds' selected by user.
> > Get the list of groupId's from search results,
> > Perform intersection of both and then get the entire result set of only
> > those groupid that intersected. Is this better way? Can I use any cache
> > technique in this case?
> >
> >
> > - David.
>


Solr's Filtering approaches

2013-10-09 Thread David Philip
Hi All,

I have an issue in handling filters for one of our requirements and
liked to get suggestion  for the best approaches.


*Use Case:*

1.  We have List of groups and the number of groups can increase upto >1
million. Currently we have almost 90 thousand groups in the solr search
system.

2.  Just before the user hits a search, He has options to select the no. of
 groups he want to retrieve. [the distinct list of these group Names for
display are retrieved from other solr index that has more information about
groups]

*3.User Operation:** *
Say if user selected group 1A  - group 1A.  and searches for key:cancer.


The current approach I was thinking is : get search results and filter
query by groupids' list selected by user. But my concern is When these
groups list is increasing to >50k unique Ids, This can cause lot of delay
in getting search results. So wanted to know whether there are different
 filtering ways that I can try for?

I was thinking of one more approach as suggested by my colleague to do -
 intersection.  -
Get the groupIds' selected by user.
Get the list of groupId's from search results,
Perform intersection of both and then get the entire result set of only
those groupid that intersected. Is this better way? Can I use any cache
technique in this case?


- David.


Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread David Philip
Informative. Useful.Thanks


On Thu, Mar 14, 2013 at 1:59 PM, Chantal Ackermann <
c.ackerm...@it-agenten.com> wrote:

> Hi all,
>
>
> this is not a question. I just wanted to announce that I've written a blog
> post on how to set up Maven for packaging and automatic testing of a SOLR
> index configuration.
>
>
> http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/
>
> Feedback or comments appreciated!
> And again, thanks for that great piece of software.
>
> Chantal
>
>


Re: debugQuery, explain tag - What does the fieldWeight value refer to?,

2013-03-12 Thread David Philip
Hi,

  Any reply on this: How are the documents sequenced in the case when the
 product of tf idf , coord and fieldnorm is same for both the documents?

Thanks - David



P.S : This link was very useful to understand the scoring in detail:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201008.mbox/%3CAANLkTi=jpph3x5tlkbj_rax5qhex6zrcguiunhqbf...@mail.gmail.com%3E





On Mon, Mar 4, 2013 at 4:08 PM, David Philip wrote:

> Hi Chris,
>
>Thank you for the reply. okay understood about *fieldWeight*.
>
> I am actually curious to know how are the documents sequenced in this case
> when the  product of tf idf and fieldnorm is same for both the documents?
>
> Afaik, at the first step, documents are sequenced based on
> fieldWeight(product of tf idf and fieldnorm) order by desc[correct?]. But
> if both are same  then what is the next factor taken in  consideration to
> sequence?
>
>  In the below case , why doc 1 is come first and then doc2 when both
> scores are same.
>
> :  1.0469098 =,
> : *(MATCH) fieldWeight(title:updated in 7), *
> : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
> idf(docFreq=2,
> : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
> :
> :  1.0469098 =,
> : *(MATCH) fieldWeight(title:updated in 9), *
> : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
> idf(docFreq=2,
> : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
>
>
>
>
> Thanks  - David
>
>
>
>
>
>
>
>
>
> On Sat, Mar 2, 2013 at 12:23 PM, Chris Hostetter  > wrote:
>
>>
>> :  In the explain tag  (debugQuery=true)
>> : what does the *fieldWeight* value refer to?,
>>
>> fieldWeight is just a label being put on the the product of the tf, idf,
>> and fieldNorm for that term.  (I don't remember why it's refered to as the
>> "fieldWeight" ... i think it may just be historical, since these are all
>> factors of the "field query" (ie: "term query", as opposed to a "boolean
>> query" across multiple fields)
>>
>>
>> : *1.0469098* is the product of tf, idf and fieldNorm,  for both the
>> records.
>> : But field weight is different. I would like to know what is the field
>>
>> what do you mean "field weight is different" ? ... in both of the examples
>> you posted, the fieldWeight is 1.0469098 ?
>>
>> Are you perhaps refering to the numbers "7" and "9" that appear inside the
>> fieldWeight(...) label?  Those are just refering to the (internal)
>> docids (just like in the "fieldNorm(...)")
>>
>> :  1.0469098 =,
>> : *(MATCH) fieldWeight(title:updated in 7), *
>> : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
>> idf(docFreq=2,
>> : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
>> :
>> :  1.0469098 =,
>> : *(MATCH) fieldWeight(title:updated in 9), *
>> : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
>> idf(docFreq=2,
>> : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
>>
>>
>> -Hoss
>>
>
>


Re: debugQuery, explain tag - What does the fieldWeight value refer to?,

2013-03-04 Thread David Philip
Hi Chris,

   Thank you for the reply. okay understood about *fieldWeight*.

I am actually curious to know how are the documents sequenced in this case
when the  product of tf idf and fieldnorm is same for both the documents?

Afaik, at the first step, documents are sequenced based on
fieldWeight(product of tf idf and fieldnorm) order by desc[correct?]. But
if both are same  then what is the next factor taken in  consideration to
sequence?

 In the below case , why doc 1 is come first and then doc2 when both scores
are same.

:  1.0469098 =,
: *(MATCH) fieldWeight(title:updated in 7), *
: product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
idf(docFreq=2,
: maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
:
:  1.0469098 =,
: *(MATCH) fieldWeight(title:updated in 9), *
: product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
idf(docFreq=2,
: maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)




Thanks  - David









On Sat, Mar 2, 2013 at 12:23 PM, Chris Hostetter
wrote:

>
> :  In the explain tag  (debugQuery=true)
> : what does the *fieldWeight* value refer to?,
>
> fieldWeight is just a label being put on the the product of the tf, idf,
> and fieldNorm for that term.  (I don't remember why it's refered to as the
> "fieldWeight" ... i think it may just be historical, since these are all
> factors of the "field query" (ie: "term query", as opposed to a "boolean
> query" across multiple fields)
>
>
> : *1.0469098* is the product of tf, idf and fieldNorm,  for both the
> records.
> : But field weight is different. I would like to know what is the field
>
> what do you mean "field weight is different" ? ... in both of the examples
> you posted, the fieldWeight is 1.0469098 ?
>
> Are you perhaps refering to the numbers "7" and "9" that appear inside the
> fieldWeight(...) label?  Those are just refering to the (internal)
> docids (just like in the "fieldNorm(...)")
>
> :  1.0469098 =,
> : *(MATCH) fieldWeight(title:updated in 7), *
> : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
> idf(docFreq=2,
> : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
> :
> :  1.0469098 =,
> : *(MATCH) fieldWeight(title:updated in 9), *
> : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
> idf(docFreq=2,
> : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
>
>
> -Hoss
>


Re: Get search results in the order of fields names searched

2013-02-26 Thread David Philip
Hi,

  Thank you for the references. I used edismax and it works. Thanks a lot.
David


On Tue, Feb 26, 2013 at 7:33 PM, Jan Høydahl  wrote:

> Check out dismax (http://wiki.apache.org/solr/ExtendedDisMax)
>
> q="John Hopkins"&defType=edismax&qf=Author^1000 Editors^500 Raw_text^1
>
> It's not strictly layered, but by playing with the numbers you can achieve
> that effect
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 26. feb. 2013 kl. 14:55 skrev David Philip :
>
> > Hi Team,
> >
> >   Is it possible to get search results  in the order of fields names set?
> >
> > Ex: say,
> >
> >   - I have 3 fields : Author, Editors, Raw_text,
> >   - User searched for keyword: "John Hopkins",
> >   - Search query is : q= (Author: "John Hopkins" OR Editors:"John
> Hopkins"
> >   OR Raw_Text:"John Hopkins")
> >
> > Expected result:
> > Result should be returned such that it should first get all the documents
> > which had "John hopkins" in field author and then the documents which had
> > "John Hopkins" in Editors and then in documents which had John Hopkins in
> > Raw_text. So if keyword is there in the main field author, it should get
> > that document first followed by editor and raw text.
> >
> >
> > 
> > 
> > John Hopkins
> > test test test
> > Mr. John Hopkins book
> > 
> > 
> > Micheal Ranold
> > John Hopkins, Micheal, Martin
> > Micheal is the main author, John Hopkins is
> co-author
> > 
> > 
> > 
> > Feymenn
> > Micheal, Martin
> > John Hopkins
> > 
> > 
>
>