Re: Queries not supported by Lucene Query Parser syntax

2015-01-01 Thread David Philip
Hi Leonid,

   Have you had a look at edismax query parser[1]? Isn't that any use to
your requirement? I am not sure whether it is something that you are
looking for. But the question seemed to be having a query related to that.


[1] http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax



On Thu, Jan 1, 2015 at 2:38 PM, Leonid Bolshinsky leonid...@gmail.com
wrote:

 Hello,

 Are we always limited by the query parser syntax when passing a query
 string to Solr?
 What about the query elements which are not supported by the syntax?
 For example, BooleanQuery.setMinimumNumberShouldMatch(n) is translated by
 BooleanQuery.toString() into ~n. But this is not a valid query syntax. So
 how can we express this via query syntax in Solr?

 And more general question:
 Given a Lucene Query object which was built programatically by a legacy
 code (which is using Lucene and not Solr), is there any way to translate it
 into Solr query (which must be a string). As Query.toString() doesn't have
 to be a valid Lucene query syntax, does it mean that the Solr query string
 must to be manually translated from the Lucene query object? Is there any
 utility that performs this job? And, again, what about queries not
 supported by the query syntax, like CustomScoreQuery, PayloadTermQuery
 etc.? Are we always limited in Solr by the query parser syntax?

 Thanks,
 Leonid



Re: unable to upload the solr configuration to zookeeper

2014-12-31 Thread David Philip
Hi Aman,

   This error could be because the solr instance is looking for the
dependent logger jars. You should copy the jar files from solr download (
solr/example/lib/ext) to tomcat lib[1].


[1]
https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty






On Wed, Dec 31, 2014 at 7:21 PM, Aman Tandon amantandon...@gmail.com
wrote:

 Hi,

 I am trying to configure the solrcloud, I followed the myjeeva
 tutorial (myjeeva
 link
 
 http://myjeeva.com/solrcloud-cluster-single-collection-deployment.html#zookeeper-ensemble-depolyment
 )
 because i want to configure it with the tomcat. All 5 zookeeper server are
 running fine.

 Now when i tries to upload the solr configuration files it is giving me
 error. I am using solr 4.10.2 (Operating System: Linux Mint) . Please help.

 *command: *java -classpath .:/home/aman/solr_cloud/solr-cli-lib/*
  org.apache.solr.cloud.ZkCLI HELP -cmd upconfig -zkhost
 
 localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2185
  -confdir /home/aman/solr_cloud/config-files -confname myconf
 



  log4j:WARN No appenders could be found for logger
  (org.apache.zookeeper.ZooKeeper).
  log4j:WARN Please initialize the log4j system properly.
  log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for
  more info.
  -confdir and -confname are required for upconfig


 With Regards
 Aman Tandon



Re: Upgrading Solr from 1.4.1 to 4.10

2014-11-28 Thread David Philip
Hi Raja,

  Could you please mention the list of solr features that you were/are
using in Solr 1.4. There have been tremendous changes since 1.4 to 4.10.
Also, you may have to explore solr cloud for resolving the indexing
operation. But what kind of indexing problems are you facing?

You should look into the link mentioned below. The best way to upgrade from
such older version to latest is to configure the features that you were
using in Solr 1.4 into solr 4.10, run test cases and start using it.

Thanks - David.

https://cwiki.apache.org/confluence/display/solr/Upgrading+Solr











On Fri, Nov 28, 2014 at 3:14 PM, rajadilipchowdary.ko...@cognizant.com
wrote:

 Hi Team,

 We are using Apache Solr 1.4.1 for our project. Now a days we are facing
 many problems regarding solr indexing, so when we saw website we found
 latest version is 4.10, could you please help us in Upgrading the Solr.

 Is there any specific things which we need to change from our current setup

 Regards,
 Raja
 +91-8121704967
 This e-mail and any files transmitted with it are for the sole use of the
 intended recipient(s) and may contain confidential and privileged
 information. If you are not the intended recipient(s), please reply to the
 sender and destroy all copies of the original message. Any unauthorized
 review, use, disclosure, dissemination, forwarding, printing or copying of
 this email, and/or any action taken in reliance on the contents of this
 e-mail is strictly prohibited and may be unlawful. Where permitted by
 applicable law, this e-mail and other e-mail communications sent to and
 from Cognizant e-mail addresses may be monitored.



Edismax Phrase Search

2014-11-13 Thread David Philip
Hi All,

   How to do a phrase search and then term proximity search using edismax
query parser?
For ex: If the search term is red apples, the products having red
apples in their fields should be returned first and then products having
red apples with term proximity of n.

Thanks.
David


Clear Solr Admin Interface Logging page's logs

2014-10-29 Thread David Philip
Hi,

 Is there a way to clear the solr admin interface logging page's logs?

I understand that we can change the logging level but incase if I would
want to just clear the logs and say reload collection and expect to see
latest only and not the past?
Manual way or anywhere that I should clear so that I just see latest logs?


Thanks
David


Spell Check for Multiple Words

2014-10-24 Thread David Philip
Hi,

   I am trying to obtain multi-word spellcheck suggestions. For eg., I have
a title field with content as Indefinite and fictitious large numbers and
user searched for larg numberr, in that case, I wanted to obtain large
number as suggestion from spell check suggestions. Could you please tell
me what should be the configuration to get this?

The field type is text_general [that which is defined in example schema.xml]


Thanks
 David.


Word Break Spell Checker Implementation algorithm

2014-10-20 Thread David Philip
Hi,

Could you please point me to the link where I can learn about the
theory behind the implementation of word break spell checker?
Like we know that the solr's DirectSolrSpellCheck component uses levenstian
distance algorithm, what is the algorithm used behind the word break spell
checker component? How does it detects the space that is needed if it
doesn't use shingle?


Thanks - David


Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
Hi All,

   I remember using multi-words in synonyms in Solr 3.x version. In case of
multi words, I was escaping space with back slash[\] and it work as
intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
each other and so when I searched for ride makers, I obtained the search
results for all of them. The field type was same as below. I have same set
up in solr 4.10 but now the multi word space escape is getting ignored. It
is tokenizing on spaces.

 synonyms.txt
ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Analysis page:

ridemakersrideridemakerzrideridemarkridemakersmakerzcare

Field Type

fieldType name=text_syn class=solr.TextField
positionIncrementGap=100
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
  /analyzer
/fieldType



Could you please tell me what could be the issue? How do I handle
multi-word cases?




synonyms.txt
ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


Thanks - David


Re: Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
contd..

expectation was that the ride care  should not have split into two tokens.

It should have been as below. Please correct me/point me where I am wrong.


Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
care

o/p

ridemakersrideridemakerzrideridemarkridemakersmakerz

*ride care*




On Wed, Oct 15, 2014 at 7:16 PM, David Philip davidphilipshe...@gmail.com
wrote:

 Hi All,

I remember using multi-words in synonyms in Solr 3.x version. In case
 of multi words, I was escaping space with back slash[\] and it work as
 intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
 each other and so when I searched for ride makers, I obtained the search
 results for all of them. The field type was same as below. I have same set
 up in solr 4.10 but now the multi word space escape is getting ignored. It
 is tokenizing on spaces.

  synonyms.txt
 ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


 Analysis page:

 ridemakersrideridemakerzrideridemarkridemakersmakerzcare

 Field Type

 fieldType name=text_syn class=solr.TextField
 positionIncrementGap=100
   analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   /analyzer
 /fieldType



 Could you please tell me what could be the issue? How do I handle
 multi-word cases?




 synonyms.txt
 ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\ care


 Thanks - David





Re: Solr Synonyms, Escape space in case of multi words

2014-10-15 Thread David Philip
Sorry, analysis page clip is getting trimmed off and hence the indention is
lost.

Here it is :

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers | makerz|
care

expected:

ridemakers | ride | ridemakerz | ride | ridemark | ride | makers |
makerz| *ride
care*



On Wed, Oct 15, 2014 at 7:21 PM, David Philip davidphilipshe...@gmail.com
wrote:

 contd..

 expectation was that the ride care  should not have split into two
 tokens.

 It should have been as below. Please correct me/point me where I am wrong.


 Input : ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
 care

 o/p

 ridemakersrideridemakerzrideridemarkridemakersmakerz

 *ride care*




 On Wed, Oct 15, 2014 at 7:16 PM, David Philip davidphilipshe...@gmail.com
  wrote:

 Hi All,

I remember using multi-words in synonyms in Solr 3.x version. In case
 of multi words, I was escaping space with back slash[\] and it work as
 intended.  Ex: ride\ makers, riders, rider\ guards.  Each one mapped to
 each other and so when I searched for ride makers, I obtained the search
 results for all of them. The field type was same as below. I have same set
 up in solr 4.10 but now the multi word space escape is getting ignored. It
 is tokenizing on spaces.

  synonyms.txt
 ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
 care


 Analysis page:

 ridemakersrideridemakerzrideridemarkridemakersmakerzcare

 Field Type

 fieldType name=text_syn class=solr.TextField
 positionIncrementGap=100
   analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
 ignoreCase=true expand=true/
   /analyzer
 /fieldType



 Could you please tell me what could be the issue? How do I handle
 multi-word cases?




 synonyms.txt
 ridemakers, ride makers, ridemakerz, ride makerz, ride\mark, ride\
 care


 Thanks - David







SolrJ POJO Annotations

2014-07-24 Thread David Philip
Hi,

   This question is related to SolrJ document as a bean. I have an entity
that has another entity within it. Could you please tell me how to annotate
for inner entities? The issue I am facing is the inner entities fields are
missing while indexing. In the below example, It is just adding Content
fields and missing out author name and id.


Example:  Content is one class that has Author as its has-a
relationship entity.

class Content{

@Field(uniqueId)
String id;

@Field(timeStamp)
Long timeStamp;

//What should be the annotation type for this entity?
Author author;
}

class Author{
@Field(authorName)
String authorName;

@Field(authorId)
String id;

}


My schema xml is:

field name=uniqueId type=string /
field name=timeStamp type=long /
field name=authorName type=string /
field name=authorId type=string /


Thank you. - David


Re: retreive all the fields in join

2014-05-12 Thread David Philip
Hi Aman,

I think it is possible.

1. Use fl parameter.
2. Add all the 4 fields in both the schemas[schemas of core 1 and 2].
3. While querying use fl=id,name,type,page.

It will return all the fields. The document that has no data for this
field, the field will be an empty string.
Ex:  {id:111,name: abc, type:, page}
{page:17, type: fiction, id:, name:}


Thanks







On Mon, May 12, 2014 at 7:10 AM, Aman Tandon amantandon...@gmail.comwrote:

 please help me out here!!

 With Regards
 Aman Tandon


 On Sun, May 11, 2014 at 1:44 PM, Aman Tandon amantandon...@gmail.com
 wrote:

  Hi,
 
  Is there a way possible to retrieve all the fields present in both the
  cores(core 1 and core2).
 
  e.g.
  core1: {id:111,name: abc }
 
  core2: {page:17, type: fiction}
 
  I want is that, on querying both the cores I want to retrieve the results
  containing all the 4 fields, fields id, name from core1 and page, type
 from
  core2. Is it possible?
 
  With Regards
  Aman Tandon
 



Multi Lingual Analyzer

2014-01-20 Thread David Philip
Hi,



  I have a query on Multi-Lingual Analyser.


 Which one of the  below is the best approach?


1.1.To develop a translator that translates a/any language to
English and then use standard English analyzer to analyse – use translator,
both at index time and while search time?

2.  2.  To develop a language specific analyzer and use that by
creating specific field only for that language?

We have client data coming in different Languages: Kannada and Telegu and
others later.This data is basically the text written by customer in that
language.


Requirement is to develop analyzers particular for these language.



Thanks - David


Re: Store Solr OpenBitSets In Solr Indexes

2013-11-02 Thread David Philip
Oh fine. Caution point was useful for me.
Yes I wanted to do something similar to filer queries. It is not XY
problem. I am simply trying to implement  something as described below.

I have a [non-clinical] group sets in system and I want to build bitset
based on the documents belonging to that group and save it.
So that, While searching I want to retrieve similar bitset from Solr engine
for matched document and then execute logical XOR. [Am I clear with problem
explanation now?]


So what I am looking for is, If I have to retrieve bitset instance from
Solr search engine for the documents matched, how can I get it?
And How do I save bit mapping for the documents belonging to a particular
group. thus enable XOR operation.

Thanks - David










On Fri, Nov 1, 2013 at 5:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 Why are you saving this? Because if the bitset you're saving
 has anything to do with, say, filter queries, it's probably useless.

 The internal bitsets are often based on the internal Lucene doc ID,
 which will change when segment merges happen, thus the caution.

 Otherwise, theres the binary type you can probably use. It's not very
 efficient since I believe it uses base-64 encoding under the covers
 though...

 Is this an XY problem?

 Best,
 Erick


 On Wed, Oct 30, 2013 at 8:06 AM, David Philip
 davidphilipshe...@gmail.comwrote:

  Hi All,
 
  What should be the field type if I have to save solr's open bit set value
  within solr document object and retrieve it later for search?
 
OpenBitSet bits = new OpenBitSet();
 
bits.set(0);
bits.set(1000);
 
doc.addField(SolrBitSets, bits);
 
 
  What should be the field type of  SolrBitSets?
 
  Thanks
 



Store Solr OpenBitSets In Solr Indexes

2013-10-30 Thread David Philip
Hi All,

What should be the field type if I have to save solr's open bit set value
within solr document object and retrieve it later for search?

  OpenBitSet bits = new OpenBitSet();

  bits.set(0);
  bits.set(1000);

  doc.addField(SolrBitSets, bits);


What should be the field type of  SolrBitSets?

Thanks


Re: Storing 2 dimension array in Solr

2013-10-14 Thread David Philip
Hi,

  I will check for pesudo join.

Jack,
I doubt further de-normalization. Rest of the points that you told me,  I
will take them. Thank you.
Basically, We have 2 different sor indexes. One table is rarely updated but
this group-disease table has frequent update and new dieasese are added
very often. So we maintain them separately. While querying we need join
operation on table 1 and 2.

Till now, I could create a test solr index with 100k dynamic field to each
document. Further, i am yet to test. it took almost 1.5 hours to create
index for 1500 groups * each group almost having 90k dynamic fields.

I also added doc_static field which copies all the integer set from copy
fields_disease to this field. While querying I use only this filed to
retrieve.
Any best approaches, please let me know.

Thanks - David






On Sun, Oct 13, 2013 at 6:37 PM, Jack Krupansky j...@basetechnology.comwrote:

 Yeah, something like that. The key or ID field would probably just be the
 composition of the group and disease fields.

 The other thing is if occurrence is simply a boolean, omit it and omit the
 document if that disease is not present for that group. If the majority of
 the diseases are not present for a specified group, that would eliminate a
 lot of documents. Or if occurrence is not a boolean, keep the field, but
 again not add a document if the disease is not present for that group.

 My usual, over-generalized rule for dynamic fields is that they are a
 powerful tool, but only if used in moderation. Millions would not be
 moderation.

 -- Jack Krupansky

 -Original Message- From: Lee Carroll
 Sent: Sunday, October 13, 2013 8:35 AM

 To: solr-user@lucene.apache.org
 Subject: Re: Storing 2 dimension array in Solr

 I think he means a doc for each element. so you have a disease occurrence
 index

 doc
 group1/group
 dis1/dis
 occurrenceexist/occurrence
 unique Field1-1/unique field
 /doc

 assuming (and its a pretty fair assumption?) most groups have only a subset
 of diseases this will be a sparse matrix so just don't index
 the occurrence value does not exist

 basically denormalize via adding fields which don't relate to the key.

 This will work fine on modest hardware and no thought to performance for 5
 million docs. It will work fine with some though and hardware for very
 large numbers. Its worth a go anyway just to test. It should probably be
 your first method to try out.




 On 13 October 2013 12:10, Erick Erickson erickerick...@gmail.com wrote:

  This sounds like a denormalization issue. Don't be afraid G.

 Actually, I've seen from 50M 50 300M small docs on a Solr node,
 depending on query type, hardware, etc. So that gives you a
 place to start being cautious about the number of docs in your
 system. If your full expansion of your table numbers in that range,
 you might be just fine denormalizing the data.

 Alternatively, there's the pseudo join capability to consider. I'm
 usually hesitant to recommend that, but Joel is committing some
 really interesting stuff in the join area which you might take a look
 at if the existing pseudo-join isn't performant enough.

 But I'd consider denormalizing the data as the first approach.

 Best,
 Erick


 On Sun, Oct 13, 2013 at 8:07 AM, David Philip
 davidphilipshe...@gmail.com**wrote:

  Hi Jack, for the point: each element of the array as a solr document,
 with
  a group field and a disease field
  Did you mean it this way:
 
  doc
group1_grp: G1
   disease1_d: 2,
   disease2_d: 3,
  /doc
  doc
group1_grp: G2
   disease1_d: 2,
   disease2_d: 3,
  disease3_d:  1,
  disease4_d:  1,
  /doc
  similar to first case: having dynamic fields for disease?
  Will it be performance issue if disease field increase to millions?
 
 
 
 
 
 
 
 
 
 
 
  On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky 
 j...@basetechnology.com
  wrote:
 
   You may be better off indexing each element of the array as a solr
   document, with a group field and a disease field. Then you can easily
 and
   efficiently add new diseases. Then to query a row, you query for the
  group
   field having the desired group.
  
   If possible, index the array as being sparse - no document for a
 disease
   if it is not present for that group.
  
   -- Jack Krupansky
  
   -Original Message- From: David Philip
   Sent: Saturday, October 12, 2013 9:56 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Storing 2 dimension array in Solr
  
  
   Hi Erick, Yes it is. But the columns here are dynamically and very
   frequently added.They can increase upto 1 million right now. So, 1
  document
   with 1 million dynamic fields, is it fine? Or any other approach?
  
   While searching through web, I found that docValues are column
 oriented.
   http://searchhub.org/2013/04/02/fun-with-docvalues-in-**
 solr-**4-2/http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/
 
  http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/http://searchhub.org/2013/04

Re: Storing 2 dimension array in Solr

2013-10-13 Thread David Philip
Hi Jack, for the point: each element of the array as a solr document, with
a group field and a disease field
Did you mean it this way:

doc
  group1_grp: G1
 disease1_d: 2,
 disease2_d: 3,
/doc
doc
  group1_grp: G2
 disease1_d: 2,
 disease2_d: 3,
disease3_d:  1,
disease4_d:  1,
/doc
similar to first case: having dynamic fields for disease?
Will it be performance issue if disease field increase to millions?











On Sun, Oct 13, 2013 at 9:00 AM, Jack Krupansky j...@basetechnology.comwrote:

 You may be better off indexing each element of the array as a solr
 document, with a group field and a disease field. Then you can easily and
 efficiently add new diseases. Then to query a row, you query for the group
 field having the desired group.

 If possible, index the array as being sparse - no document for a disease
 if it is not present for that group.

 -- Jack Krupansky

 -Original Message- From: David Philip
 Sent: Saturday, October 12, 2013 9:56 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Storing 2 dimension array in Solr


 Hi Erick, Yes it is. But the columns here are dynamically and very
 frequently added.They can increase upto 1 million right now. So, 1 document
 with 1 million dynamic fields, is it fine? Or any other approach?

 While searching through web, I found that docValues are column oriented.
 http://searchhub.org/2013/04/**02/fun-with-docvalues-in-solr-**4-2/http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
 However,  I did not understand, how to use docValues to add these columns.

 What is the recommended approach?

 Thanks - David






 On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.com*
 *wrote:

  Isn't this just indexing each row as a separate document
 with a suitable ID groupN in your example?


 On Sat, Oct 12, 2013 at 2:43 PM, David Philip
 davidphilipshe...@gmail.com**wrote:

  Hi Erick,
 
 We have set of groups as represented below. New columns (diseases as
 in
  below matrix) keep coming and we need to add them as new column. To that
  column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
  notfound) for respective groups.
 
  While querying we need  to get the entire row for group:group1.  We
 will
  not be searching on columns(*_disease) values, index=false but stored is
  true.
 
  for ex: we use, get group:group1 and we need to get the entire row-
  exist,slight, not found. Hoping this explanation is clearer.
 
 disease1disease2 disease3
  group1exist slight  not found
  groups2   slightnot foundexist
  group3slight exist
  groupK-na exist
 
 
 
  Thanks - David
 
 
 
 
 
  On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   David:
  
   This feels like it may be an XY problem. _Why_ do you
   want to store a 2-dimensional array and what
   do you want to do with it? Maybe there are better
   approaches.
  
   Best
   Erick
  
  
   On Sat, Oct 12, 2013 at 2:07 AM, David Philip
   davidphilipshe...@gmail.com**wrote:
  
Hi,
   
  I have a 2 dimension array and want it to be persisted in solr. 
   How
   can I
do that?
   
Sample case:
   
 disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group2slight exist
   
exist-1 not found - 2 slight-3 .. can be stored like this also.
   
Note: This array has frequent updates.  Every time new disease get's
   added
and I have to add description about that disease to all groups. And
 at
query time, I will do get by row  - get by group only group = group2
  row.
   
Any suggestion on how I can achieve this?  I am thankful to the  
  forum
  for
replying with patience, on achieving this, i will blog and will  
  share
  it
with all.
   
Thanks - David
   
  
 





Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi,

  I have a 2 dimension array and want it to be persisted in solr. How can I
do that?

Sample case:

 disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group2slight exist

exist-1 not found - 2 slight-3 .. can be stored like this also.

Note: This array has frequent updates.  Every time new disease get's added
and I have to add description about that disease to all groups. And at
query time, I will do get by row  - get by group only group = group2 row.

Any suggestion on how I can achieve this?  I am thankful to the forum for
replying with patience, on achieving this, i will blog and will share it
with all.

Thanks - David


Re: Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi Erick,

   We have set of groups as represented below. New columns (diseases as in
below matrix) keep coming and we need to add them as new column. To that
column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
notfound) for respective groups.

While querying we need  to get the entire row for group:group1.  We will
not be searching on columns(*_disease) values, index=false but stored is
true.

for ex: we use, get group:group1 and we need to get the entire row-
exist,slight, not found. Hoping this explanation is clearer.

   disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group3slight exist
groupK-na exist



Thanks - David





On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson erickerick...@gmail.comwrote:

 David:

 This feels like it may be an XY problem. _Why_ do you
 want to store a 2-dimensional array and what
 do you want to do with it? Maybe there are better
 approaches.

 Best
 Erick


 On Sat, Oct 12, 2013 at 2:07 AM, David Philip
 davidphilipshe...@gmail.comwrote:

  Hi,
 
I have a 2 dimension array and want it to be persisted in solr. How
 can I
  do that?
 
  Sample case:
 
   disease1disease2 disease3
  group1exist slight  not found
  groups2   slightnot foundexist
  group2slight exist
 
  exist-1 not found - 2 slight-3 .. can be stored like this also.
 
  Note: This array has frequent updates.  Every time new disease get's
 added
  and I have to add description about that disease to all groups. And at
  query time, I will do get by row  - get by group only group = group2 row.
 
  Any suggestion on how I can achieve this?  I am thankful to the forum for
  replying with patience, on achieving this, i will blog and will share it
  with all.
 
  Thanks - David
 



Re: Storing 2 dimension array in Solr

2013-10-12 Thread David Philip
Hi Erick, Yes it is. But the columns here are dynamically and very
frequently added.They can increase upto 1 million right now. So, 1 document
with 1 million dynamic fields, is it fine? Or any other approach?

While searching through web, I found that docValues are column oriented.
http://searchhub.org/2013/04/02/fun-with-docvalues-in-solr-4-2/
However,  I did not understand, how to use docValues to add these columns.

What is the recommended approach?

Thanks - David






On Sun, Oct 13, 2013 at 3:33 AM, Erick Erickson erickerick...@gmail.comwrote:

 Isn't this just indexing each row as a separate document
 with a suitable ID groupN in your example?


 On Sat, Oct 12, 2013 at 2:43 PM, David Philip
 davidphilipshe...@gmail.comwrote:

  Hi Erick,
 
 We have set of groups as represented below. New columns (diseases as
 in
  below matrix) keep coming and we need to add them as new column. To that
  column, we have values such as 1 or 2 or 3 or 4 (exist, slight, na,
  notfound) for respective groups.
 
  While querying we need  to get the entire row for group:group1.  We
 will
  not be searching on columns(*_disease) values, index=false but stored is
  true.
 
  for ex: we use, get group:group1 and we need to get the entire row-
  exist,slight, not found. Hoping this explanation is clearer.
 
 disease1disease2 disease3
  group1exist slight  not found
  groups2   slightnot foundexist
  group3slight exist
  groupK-na exist
 
 
 
  Thanks - David
 
 
 
 
 
  On Sat, Oct 12, 2013 at 11:39 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
   David:
  
   This feels like it may be an XY problem. _Why_ do you
   want to store a 2-dimensional array and what
   do you want to do with it? Maybe there are better
   approaches.
  
   Best
   Erick
  
  
   On Sat, Oct 12, 2013 at 2:07 AM, David Philip
   davidphilipshe...@gmail.comwrote:
  
Hi,
   
  I have a 2 dimension array and want it to be persisted in solr. How
   can I
do that?
   
Sample case:
   
 disease1disease2 disease3
group1exist slight  not found
groups2   slightnot foundexist
group2slight exist
   
exist-1 not found - 2 slight-3 .. can be stored like this also.
   
Note: This array has frequent updates.  Every time new disease get's
   added
and I have to add description about that disease to all groups. And
 at
query time, I will do get by row  - get by group only group = group2
  row.
   
Any suggestion on how I can achieve this?  I am thankful to the forum
  for
replying with patience, on achieving this, i will blog and will share
  it
with all.
   
Thanks - David
   
  
 



Re: Solr's Filtering approaches

2013-10-11 Thread David Philip
Groups are pharmaceutical research expts.. User is presented with graph
view, he can select some region and all the groups in that region gets
included..user can modify the groups also here.. so we didn't maintain
group information in same solr index but we have externalized.
I looked at post filter article. So my understanding is that, I simply have
to extended as you did and should include implementaton for
isAllowed(acls[doc], groups) .This will filter the documents in the
collector and finally this collector will be returned. am I right?

  @Override
  public void collect(int doc) throws IOException {
if (isAllowed(acls[doc], user, groups)) super.collect(doc);
  }


Erick, I am interested to know whether I can extend any class that can
return me only the bitset of the documents that match the search query. I
can then do bitset1.andbitset2OfGroups - finally, collect only those
documents to return to user. How do I try this approach? Any pointers for
bit set?

Thanks - David




On Thu, Oct 10, 2013 at 5:25 PM, Erick Erickson erickerick...@gmail.comwrote:

 Well, my first question is why 50K groups is necessary, and
 whether you can simplify that. How a user can manually
 choose from among that many groups is interesting. But
 assuming they're all necessary, I can think of two things.

 If the user can only select ranges, just put in filter queries
 using ranges. Or possibly both ranges and individual entries,
 as fq=group:[1A TO 1A] OR group:(2B 45C 98Z) etc.
 You need to be a little careful how you put index these so
 range queries work properly, in the above you'd miss
 2A because it's sorting lexicographically, you'd need to
 store in some form that sorts like 001A 01A
 and so on. You wouldn't need to show that form to the
 user, just form your fq's in the app to work with
 that form.

 If that won't work (you wouldn't want this to get huge), think
 about a post filter that would only operate on documents that
 had made it through the select, although how to convey which
 groups the user selected to the post filter is an open
 question.

 Best,
 Erick

 On Wed, Oct 9, 2013 at 12:23 PM, David Philip
 davidphilipshe...@gmail.com wrote:
  Hi All,
 
  I have an issue in handling filters for one of our requirements and
  liked to get suggestion  for the best approaches.
 
 
  *Use Case:*
 
  1.  We have List of groups and the number of groups can increase upto 1
  million. Currently we have almost 90 thousand groups in the solr search
  system.
 
  2.  Just before the user hits a search, He has options to select the no.
 of
   groups he want to retrieve. [the distinct list of these group Names for
  display are retrieved from other solr index that has more information
 about
  groups]
 
  *3.User Operation:** *
  Say if user selected group 1A  - group 1A.  and searches for
 key:cancer.
 
 
  The current approach I was thinking is : get search results and filter
  query by groupids' list selected by user. But my concern is When these
  groups list is increasing to 50k unique Ids, This can cause lot of delay
  in getting search results. So wanted to know whether there are different
   filtering ways that I can try for?
 
  I was thinking of one more approach as suggested by my colleague to do -
   intersection.  -
  Get the groupIds' selected by user.
  Get the list of groupId's from search results,
  Perform intersection of both and then get the entire result set of only
  those groupid that intersected. Is this better way? Can I use any cache
  technique in this case?
 
 
  - David.



Solr's Filtering approaches

2013-10-09 Thread David Philip
Hi All,

I have an issue in handling filters for one of our requirements and
liked to get suggestion  for the best approaches.


*Use Case:*

1.  We have List of groups and the number of groups can increase upto 1
million. Currently we have almost 90 thousand groups in the solr search
system.

2.  Just before the user hits a search, He has options to select the no. of
 groups he want to retrieve. [the distinct list of these group Names for
display are retrieved from other solr index that has more information about
groups]

*3.User Operation:** *
Say if user selected group 1A  - group 1A.  and searches for key:cancer.


The current approach I was thinking is : get search results and filter
query by groupids' list selected by user. But my concern is When these
groups list is increasing to 50k unique Ids, This can cause lot of delay
in getting search results. So wanted to know whether there are different
 filtering ways that I can try for?

I was thinking of one more approach as suggested by my colleague to do -
 intersection.  -
Get the groupIds' selected by user.
Get the list of groupId's from search results,
Perform intersection of both and then get the entire result set of only
those groupid that intersected. Is this better way? Can I use any cache
technique in this case?


- David.


Re: Blog Post: Integration Testing SOLR Index with Maven

2013-03-14 Thread David Philip
Informative. Useful.Thanks


On Thu, Mar 14, 2013 at 1:59 PM, Chantal Ackermann 
c.ackerm...@it-agenten.com wrote:

 Hi all,


 this is not a question. I just wanted to announce that I've written a blog
 post on how to set up Maven for packaging and automatic testing of a SOLR
 index configuration.


 http://blog.it-agenten.com/2013/03/integration-testing-your-solr-index-with-maven/

 Feedback or comments appreciated!
 And again, thanks for that great piece of software.

 Chantal




Re: debugQuery, explain tag - What does the fieldWeight value refer to?,

2013-03-12 Thread David Philip
Hi,

  Any reply on this: How are the documents sequenced in the case when the
 product of tf idf , coord and fieldnorm is same for both the documents?

Thanks - David



P.S : This link was very useful to understand the scoring in detail:
http://mail-archives.apache.org/mod_mbox/lucene-java-user/201008.mbox/%3CAANLkTi=jpph3x5tlkbj_rax5qhex6zrcguiunhqbf...@mail.gmail.com%3E





On Mon, Mar 4, 2013 at 4:08 PM, David Philip davidphilipshe...@gmail.comwrote:

 Hi Chris,

Thank you for the reply. okay understood about *fieldWeight*.

 I am actually curious to know how are the documents sequenced in this case
 when the  product of tf idf and fieldnorm is same for both the documents?

 Afaik, at the first step, documents are sequenced based on
 fieldWeight(product of tf idf and fieldnorm) order by desc[correct?]. But
 if both are same  then what is the next factor taken in  consideration to
 sequence?

  In the below case , why doc 1 is come first and then doc2 when both
 scores are same.

 :  1.0469098 =,
 : *(MATCH) fieldWeight(title:updated in 7), *
 : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
 idf(docFreq=2,
 : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
 :
 :  1.0469098 =,
 : *(MATCH) fieldWeight(title:updated in 9), *
 : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
 idf(docFreq=2,
 : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)




 Thanks  - David









 On Sat, Mar 2, 2013 at 12:23 PM, Chris Hostetter hossman_luc...@fucit.org
  wrote:


 :  In the explain tag  (debugQuery=true)
 : what does the *fieldWeight* value refer to?,

 fieldWeight is just a label being put on the the product of the tf, idf,
 and fieldNorm for that term.  (I don't remember why it's refered to as the
 fieldWeight ... i think it may just be historical, since these are all
 factors of the field query (ie: term query, as opposed to a boolean
 query across multiple fields)


 : *1.0469098* is the product of tf, idf and fieldNorm,  for both the
 records.
 : But field weight is different. I would like to know what is the field

 what do you mean field weight is different ? ... in both of the examples
 you posted, the fieldWeight is 1.0469098 ?

 Are you perhaps refering to the numbers 7 and 9 that appear inside the
 fieldWeight(...) label?  Those are just refering to the (internal)
 docids (just like in the fieldNorm(...))

 :  1.0469098 =,
 : *(MATCH) fieldWeight(title:updated in 7), *
 : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
 idf(docFreq=2,
 : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
 :
 :  1.0469098 =,
 : *(MATCH) fieldWeight(title:updated in 9), *
 : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
 idf(docFreq=2,
 : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)


 -Hoss





Re: debugQuery, explain tag - What does the fieldWeight value refer to?,

2013-03-04 Thread David Philip
Hi Chris,

   Thank you for the reply. okay understood about *fieldWeight*.

I am actually curious to know how are the documents sequenced in this case
when the  product of tf idf and fieldnorm is same for both the documents?

Afaik, at the first step, documents are sequenced based on
fieldWeight(product of tf idf and fieldnorm) order by desc[correct?]. But
if both are same  then what is the next factor taken in  consideration to
sequence?

 In the below case , why doc 1 is come first and then doc2 when both scores
are same.

:  1.0469098 =,
: *(MATCH) fieldWeight(title:updated in 7), *
: product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
idf(docFreq=2,
: maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
:
:  1.0469098 =,
: *(MATCH) fieldWeight(title:updated in 9), *
: product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
idf(docFreq=2,
: maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)




Thanks  - David









On Sat, Mar 2, 2013 at 12:23 PM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 :  In the explain tag  (debugQuery=true)
 : what does the *fieldWeight* value refer to?,

 fieldWeight is just a label being put on the the product of the tf, idf,
 and fieldNorm for that term.  (I don't remember why it's refered to as the
 fieldWeight ... i think it may just be historical, since these are all
 factors of the field query (ie: term query, as opposed to a boolean
 query across multiple fields)


 : *1.0469098* is the product of tf, idf and fieldNorm,  for both the
 records.
 : But field weight is different. I would like to know what is the field

 what do you mean field weight is different ? ... in both of the examples
 you posted, the fieldWeight is 1.0469098 ?

 Are you perhaps refering to the numbers 7 and 9 that appear inside the
 fieldWeight(...) label?  Those are just refering to the (internal)
 docids (just like in the fieldNorm(...))

 :  1.0469098 =,
 : *(MATCH) fieldWeight(title:updated in 7), *
 : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
 idf(docFreq=2,
 : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)
 :
 :  1.0469098 =,
 : *(MATCH) fieldWeight(title:updated in 9), *
 : product of: 1.0 = tf(termFreq(title:updated)=1), 2.7917595 =
 idf(docFreq=2,
 : maxDocs=18), 0.375 = fieldNorm(field=title, doc=7)


 -Hoss



Re: Get search results in the order of fields names searched

2013-02-26 Thread David Philip
Hi,

  Thank you for the references. I used edismax and it works. Thanks a lot.
David


On Tue, Feb 26, 2013 at 7:33 PM, Jan Høydahl jan@cominvent.com wrote:

 Check out dismax (http://wiki.apache.org/solr/ExtendedDisMax)

 q=John HopkinsdefType=edismaxqf=Author^1000 Editors^500 Raw_text^1

 It's not strictly layered, but by playing with the numbers you can achieve
 that effect

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 26. feb. 2013 kl. 14:55 skrev David Philip davidphilipshe...@gmail.com:

  Hi Team,
 
Is it possible to get search results  in the order of fields names set?
 
  Ex: say,
 
- I have 3 fields : Author, Editors, Raw_text,
- User searched for keyword: John Hopkins,
- Search query is : q= (Author: John Hopkins OR Editors:John
 Hopkins
OR Raw_Text:John Hopkins)
 
  Expected result:
  Result should be returned such that it should first get all the documents
  which had John hopkins in field author and then the documents which had
  John Hopkins in Editors and then in documents which had John Hopkins in
  Raw_text. So if keyword is there in the main field author, it should get
  that document first followed by editor and raw text.
 
 
  result name=response numFound=3 start=0
  doc
  str name=AuthorJohn Hopkins/str
  str name=Editorstest test test/str
  str name=Raw_textMr. John Hopkins book/str
  /doc
  doc
  str name=AuthorMicheal Ranold/str
  str name=EditorsJohn Hopkins, Micheal, Martin/str
  str name=Raw_textMicheal is the main author, John Hopkins is
 co-author
  /str
  /doc
  doc
  str name=AuthorFeymenn/str
  str name=EditorsMicheal, Martin/str
  str name=Raw_textJohn Hopkins/str
  /doc
  /result