Re: Solr Deleting Docs after Indexing

2017-09-11 Thread Kaushik
It was indeed the duplicate Id's. Somehow I thought I had it unique all the
way.

Thanks,
Kaushik

On Mon, Sep 11, 2017 at 3:21 PM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> Does all 4 document's have same docID (Unqiue key)?
>
> On Mon, Sep 11, 2017 at 2:44 PM, Kaushik <kaushika...@gmail.com> wrote:
>
> > I am using Solr 5.3 and have a custom Solr J application to write to
> Solr.
> > When I index using this application, I expect to see 4 documents indexed.
> > But for some strange reason, 3 documents get deleted and there is always
> > only 1 document in the index. I say that because the final tally on the
> > Solr Admin console is
> > Num Docs: 1
> > Max Doc: 4
> > Deleted Docs: 3
> >
> >
> > How and where in Solr/logs can I find why the documents are being
> deleted?
> >
> > Thanks,
> > Kaushik
> >
>


Solr Deleting Docs after Indexing

2017-09-11 Thread Kaushik
I am using Solr 5.3 and have a custom Solr J application to write to Solr.
When I index using this application, I expect to see 4 documents indexed.
But for some strange reason, 3 documents get deleted and there is always
only 1 document in the index. I say that because the final tally on the
Solr Admin console is
Num Docs: 1
Max Doc: 4
Deleted Docs: 3


How and where in Solr/logs can I find why the documents are being deleted?

Thanks,
Kaushik


Re: Number of occurrences in Solr Documents

2017-06-29 Thread Kaushik
Thanks to Susheel and Shawn. Unfortunately the Solr version we have is Solr
5.3 and it does not include the totaltermfrequency feature. Is there any
downside of using TermVectorFrequency ; like peformance issues?

On Thu, Jun 29, 2017 at 11:49 AM, Susheel Kumar <susheel2...@gmail.com>
wrote:

> That's even better. Thanks, Shawn.
>
> On Thu, Jun 29, 2017 at 11:45 AM, Shawn Heisey <apa...@elyograg.org>
> wrote:
>
> > On 6/29/2017 8:40 AM, Kaushik wrote:
> > > We are trying to get the most frequently used words in a collection.
> > > My understanding is that using facet.field=content_txt. An e.g. of
> > > content_txt value is "The fox jumped over another fox". In such a
> > > scenario, I am expecting the facet to return with "fox" and with a
> > > count value of 2. However, we end up getting "fox" with a value of 1.
> > > It appears we are getting total number of documents that match the
> > > query as opposed to the total number of times the word ocurred. How
> > > can the latter be achieved?
> >
> > Facets count the number of documents, not the number of terms.
> >
> > You might be after the terms component.
> >
> > https://lucene.apache.org/solr/guide/6_6/the-terms-component.html
> >
> > This generally works across the entire index, while facets can operate
> > on documents that match a query.
> >
> > Thanks,
> > Shawn
> >
> >
>


Number of occurrences in Solr Documents

2017-06-29 Thread Kaushik
Hello,

We are trying to get the most frequently used words in a collection. My
understanding is that using facet.field=content_txt. An e.g. of content_txt
value is "The fox jumped over another fox". In such a scenario, I am
expecting the facet to return with "fox" and with a count value of 2.
However, we end up getting "fox" with a value of 1. It appears we are
getting total number of documents that match the query as opposed to the
total number of times the word ocurred. How can the latter be achieved?

Thanks,
AK


How does using cacheKey and lookup behave?

2017-01-18 Thread Kaushik
I use the cacheKey, cacheLookup, SortedMapBackedCache in the Data Import
Handler of Solr 5.x to join two or more entities. Does this give me an
equivalent of Sql's inner join? If so, how can I get something similar to
left join?

Thank you,
Kaushik


Is there Solr limitation on size for document retrieval?

2017-01-05 Thread Kaushik
Hello,

Is there a limit on the size of a document that can be indexed and rendered
by Solr? We use Solr 5.3.1 and while we are able to index a document of 40
mb size withouot any issue, we are unable to retrieve the indexed
SolrDocument. Is there any configuration that we can use to spit out the
entire document?

Also, the only reason why we need the whole document is because of the
highlighting feature. It would be great if we can just get a snippet of the
text, instead of the entire content field for highlighting.

Thanks,
Kaushik


Adding multiple entities to single core

2015-06-08 Thread Naman Kaushik
Hi Admin,I am a new-bee to SOLR.I want to establish multiple entities within a 
single core so that each entity refers data from two different tables and hence 
indexing the data.Please help me out Attached are my schema.xml and 
data-config.xml files
Looking forward for a positive response
--Namanspan style=font-family: Trebuchet MS, sans-serif;
dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver url=jdbc:mysql://localhost/db_ipc user=root password= batchSize=1 /
    document name=listing
       entity name=listings_data pk=property_id query=SELECT * FROM member_listing   
field column=l_id template=l_${member_listing.id} template=listing /
field column=id name=id template=listing  /
field column=property_id name=property_id template=listing   /
field column=member_id name=member_id  template=listing  /
field column=property_type_id name=property_type_id template=listing   /
field column=property_for name=property_for template=listing   /
field column=l_location_id name=location_id template=listing  /
field column=l_location name=location template=listing   /
field column=l_city name=city template=listing   /
field column=l_city_other name=city_other template=listing   /
field column=price name=price template=listing   /
field column=area name=area  template=listing  /
field column=area_unit name=area_unit template=listing   /
field column=area_in_sqfeet name=area_in_sqfeet  template=listing  /
field column=is_negotiable name=is_negotiable  template=listing  /
field column=deposit_amount name=deposit_amount template=listing   /
field column=bedrooms name=bedrooms template=listing   /
field column=reposted_date name=reposted_date  template=listing  /
field column=contact_name name=contact_name template=listing   /
field column=contact_phone name=contact_phone  template=listing  /
field column=contact_mobile name=contact_mobile template=listing  /
field column=contact_email name=contact_email template=listing   /
field column=property_address name=property_address template=listing   /
field column=project_society name=project_society template=listing   /
field column=furnished name=furnished template=listing   /
field column=age_of_construction name=age_of_construction template=listing   /
/entity

 entity name=user_data pk=member_id query=SELECT * FROM member  
field column=m_id template=l_${member.member_id}  template=member  / 
field column=member_id name=m_member_id template=member  /
field column=username name=m_username template=member  /
field column=password name=m_password template=member  /
field column=fullname name=m_fullname template=member  /
field column=email name=m_email template=member  /
field column=address name=m_address template=member  /
field column=city name=m_city template=member  /
field column=locality_id name=m_locality_id template=member  /
field column=mobile name=m_mobile template=member  /
field column=member_type name=m_member_typetemplate=member   /
/entity
    /document
/dataConfig
/span

?xml version=1.0 encoding=UTF-8?
!-- Solr managed schema - automatically generated - DO NOT EDIT --
schema name=example-data-driven-schema version=1.5
  uniqueKeyid/uniqueKey
  fieldType name=ancestor_path class=solr.TextField
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
analyzer type=query
  tokenizer class=solr.PathHierarchyTokenizerFactory delimiter=//
/analyzer
  /fieldType
  fieldType name=binary class=solr.BinaryField/
  fieldType name=boolean class=solr.BoolField sortMissingLast=true/
  fieldType name=booleans class=solr.BoolField multiValued=true sortMissingLast=true/
  fieldType name=currency class=solr.CurrencyField precisionStep=8 currencyConfig=currency.xml defaultCurrency=USD/
  fieldType name=date class=solr.TrieDateField precisionStep=0 positionIncrementGap=0/
  fieldType name=dates class=solr.TrieDateField precisionStep=0 multiValued=true positionIncrementGap=0/
  fieldType name=descendent_path class=solr.TextField
analyzer type=index
  tokenizer class=solr.PathHierarchyTokenizerFactory delimiter=//
/analyzer
analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory/
/analyzer
  /fieldType
  fieldType name=double class=solr.TrieDoubleField precisionStep=0 positionIncrementGap=0/
  fieldType name=doubles class=solr.TrieDoubleField precisionStep=0 multiValued=true positionIncrementGap=0/
  fieldType name=float class=solr.TrieFloatField precisionStep=0 positionIncrementGap=0/
  fieldType name=floats class=solr.TrieFloatField precisionStep=0 multiValued=true positionIncrementGap=0/
  fieldType name=ignored class=solr.StrField multiValued=true indexed=false stored=false/
  fieldType name=int class=solr.TrieIntField precisionStep=0 positionIncrementGap=0/
  fieldType name=ints class=solr.TrieIntField precisionStep=0 multiValued=true positionIncrementGap=0/
  fieldType name=location class=solr.LatLonType subFieldSuffix=_coordinate/
  fieldType 

Re: Injecting synonymns into Solr

2015-04-30 Thread Kaushik
I am facing the same problem; currently I am resorting to a custom program
to create this file. Hopefully there is a better solution out there.

Thanks,
Kaushik

On Thu, Apr 30, 2015 at 3:58 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 Does anyone knows any faster method of populating the synonyms.txt file
 instead of manually typing in the words into the file, which there could be
 thousands of synonyms around?

 Regards,
 Edwin



Re: Mutli term synonyms

2015-04-29 Thread Kaushik
Hi Roman,

Following is my use case:

*Schema.xml*...

   field name=name type=text_autophrase indexed=true stored=true/

fieldType name=text_autophrase class=solr.TextField
   positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter
class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory
phrases=autophrases.txt includeTokens=false
replaceWhitespaceWith=X /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
  /analyzer
/fieldType

*SolrConfig.xml...*

name=/autophrase class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows10/int
 str name=dfname/str
 str name=defTypeautophrasingParser/str
   /lst
  /requestHandler

  queryParser name=autophrasingParser
   class=com.lucidworks.analysis.AutoPhrasingQParserPlugin 
str name=phrasesautophrases.txt/str
str name=replaceWhitespaceWithX/str
  /queryParser


*Synonyms.txt*
PEG-20 SORBITAN LAURATE,POLYOXYETHYLENE 20 SORBITAN MONOLAURATE,TWEEN
20,POLYSORBATE 20 [USAN],POLYSORBATE 20 [INCI],POLYSORBATE 20
[II],POLYSORBATE 20 [HSDB],TWEEN-20,PEG-20 SORBITAN,PEG-20 SORBITAN
[VANDF],POLYSORBATE-20,POLYSORBATE 20,SORETHYTAN MONOLAURATE,T-MAZ
20,POLYOXYETHYLENE (20) SORBITAN MONOLAURATE,SORBITAN
MONODODECANOATE,POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE,POLYOXYETHYLENE
SORBITAN MONOLAURATE,POLYSORBATE 20 [MART.],SORBIMACROGOL LAURATE
300,POLYSORBATE 20 [FHFI],FEMA NO. 2915,POLYSORBATE 20 [FCC],POLYSORBATE 20
[WHO-DD],POLYSORBATE 20 [VANDF]

*Autophrase.txt...*

Has all the above phrases in one column

*Indexed document*

doc
  field name=id31/field
  field name=namePolysorbate 20/field
  /doc

So when I query SOLR /autphrase for tween 20 or FEMA NO. 2915, I expect to
see the record containig Polysorbate 20. i.e.
http://localhost:8983/solr/collection1/autophrase?q=tween+20wt=jsonindent=true
should have retrieved it; but it doesnt.

What could I be doing wrong?

On Wed, Apr 29, 2015 at 2:10 AM, Roman Chyla roman.ch...@gmail.com wrote:

 I'm not sure I understand - the autophrasing filter will allow the
 parser to see all the tokens, so that they can be parsed (and
 multi-token synonyms) identified. So if you are using the same
 analyzer at query and index time, they should be able to see the same
 stuff.

 are you using multi-token synonyms, or just entries that look like
 multi synonym? (in the first case, the tokens are separated by null
 byte) - in the second case, they are just strings even with
 whitespaces, your synonym file must contain exactly the same entries
 as your analyzer sees them (and in the same order; or you have to use
 the same analyzer to load the synonym files)

 can you post the relevant part of your schema.xml?


 note: I can confirm that multi-token synonym expansion can be made to
 work, even in complex cases - we do it - but likely, if you need
 multi-token synonyms, you will also need a smarter query parser.
 sometimes your users will use query strings that contain overlapping
 synonym entries, to handle that, you will have to know how to generate
 all possible 'reads', example

 synonym:

 foo bar, foobar
 hey foo, heyfoo

 user input:

 hey foo bar

 possible readings:

 ((hey foo) +bar) OR (hey +(foo bar))

 i'm simplifying it here, the fun starts when you are seeing a phrase query
 :)

 On Tue, Apr 28, 2015 at 10:31 AM, Kaushik kaushika...@gmail.com wrote:
  Hi there,
 
  I tried the solution provided in
 
 https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
  .The mentioned solution works when the indexed data does not have alpha
  numerics or special characters. But in  my case the synonyms are
 something
  like the below.
 
 
   T-MAZ 20  POLYOXYETHYLENE (20) SORBITAN MONOLAURATE  SORBITAN
  MONODODECANOATE  POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE  POLYOXYETHYLENE
  SORBITAN MONOLAURATE  POLYSORBATE 20 [MART.]  SORBIMACROGOL LAURATE
  300  POLYSORBATE
  20 [FHFI]  FEMA NO. 2915
 
  They have alpha numerics, special characters, spaces, etc. Is there a way
  to implment synonyms even in such case?
 
  Thanks,
  Kaushik
 
  On Mon, Apr 20, 2015 at 11:03 AM, Davis, Daniel (NIH/NLM) [C] 
  daniel.da...@nih.gov wrote:
 
  Handling MESH descriptor preferred terms and such is similar

Re: Mutli term synonyms

2015-04-29 Thread Kaushik
Hi Roman,

When I used the debugQuery using
http://localhost:8983/solr/collection1/autophrase?q=tween+20wt=jsonindent=truedebugQuery=true
I see the following in the response. The autophrase plugin seems to be
doing its part. Just not the synonym expansion. When you say use phrase
queries, what do you mean? Please clarify.

response: {
numFound: 0,
start: 0,
docs: []
  },
  debug: {
rawquerystring: tween 20,
querystring: tween 20,
parsedquery: name:tweenx20,
parsedquery_toString: name:tweenx20,
explain: {},

Thank you,

Kaushik


On Wed, Apr 29, 2015 at 4:00 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Pls post output of the request with debugQuery=true

 Do you see the synonyms being expanded? Probably not.

 You can go to the administer iface, in the analyzer section play with the
 input until you see the synonyms. Use phrase queries too. That will be
 helpful to elliminate autophrase filter
 On Apr 29, 2015 6:18 AM, Kaushik kaushika...@gmail.com wrote:

  Hi Roman,
 
  Following is my use case:
 
  *Schema.xml*...
 
 field name=name type=text_autophrase indexed=true
 stored=true/
 
  fieldType name=text_autophrase class=solr.TextField
 positionIncrementGap=100
analyzer type=index
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter
  class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory
  phrases=autophrases.txt includeTokens=false
  replaceWhitespaceWith=X /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
/analyzer
analyzer type=query
  tokenizer class=solr.KeywordTokenizerFactory/
  filter class=solr.LowerCaseFilterFactory /
  filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
  ignoreCase=true expand=true /
  filter class=solr.StopFilterFactory ignoreCase=true
  words=stopwords.txt enablePositionIncrements=true /
/analyzer
  /fieldType
 
  *SolrConfig.xml...*
 
  name=/autophrase class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows10/int
   str name=dfname/str
   str name=defTypeautophrasingParser/str
 /lst
/requestHandler
 
queryParser name=autophrasingParser
 class=com.lucidworks.analysis.AutoPhrasingQParserPlugin
 
  str name=phrasesautophrases.txt/str
  str name=replaceWhitespaceWithX/str
/queryParser
 
 
  *Synonyms.txt*
  PEG-20 SORBITAN LAURATE,POLYOXYETHYLENE 20 SORBITAN MONOLAURATE,TWEEN
  20,POLYSORBATE 20 [USAN],POLYSORBATE 20 [INCI],POLYSORBATE 20
  [II],POLYSORBATE 20 [HSDB],TWEEN-20,PEG-20 SORBITAN,PEG-20 SORBITAN
  [VANDF],POLYSORBATE-20,POLYSORBATE 20,SORETHYTAN MONOLAURATE,T-MAZ
  20,POLYOXYETHYLENE (20) SORBITAN MONOLAURATE,SORBITAN
  MONODODECANOATE,POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE,POLYOXYETHYLENE
  SORBITAN MONOLAURATE,POLYSORBATE 20 [MART.],SORBIMACROGOL LAURATE
  300,POLYSORBATE 20 [FHFI],FEMA NO. 2915,POLYSORBATE 20 [FCC],POLYSORBATE
 20
  [WHO-DD],POLYSORBATE 20 [VANDF]
 
  *Autophrase.txt...*
 
  Has all the above phrases in one column
 
  *Indexed document*
 
  doc
field name=id31/field
field name=namePolysorbate 20/field
/doc
 
  So when I query SOLR /autphrase for tween 20 or FEMA NO. 2915, I expect
 to
  see the record containig Polysorbate 20. i.e.
 
 
 http://localhost:8983/solr/collection1/autophrase?q=tween+20wt=jsonindent=true
  should have retrieved it; but it doesnt.
 
  What could I be doing wrong?
 
  On Wed, Apr 29, 2015 at 2:10 AM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   I'm not sure I understand - the autophrasing filter will allow the
   parser to see all the tokens, so that they can be parsed (and
   multi-token synonyms) identified. So if you are using the same
   analyzer at query and index time, they should be able to see the same
   stuff.
  
   are you using multi-token synonyms, or just entries that look like
   multi synonym? (in the first case, the tokens are separated by null
   byte) - in the second case, they are just strings even with
   whitespaces, your synonym file must contain exactly the same entries
   as your analyzer sees them (and in the same order; or you have to use
   the same analyzer to load the synonym files)
  
   can you post the relevant part of your schema.xml?
  
  
   note: I can confirm that multi-token synonym expansion can be made to
   work, even in complex cases - we do it - but likely, if you need
   multi-token synonyms, you will also need a smarter query parser.
   sometimes your users will use query strings that contain overlapping
   synonym entries, to handle that, you will have to know how to generate
   all possible

Re: analyzer, indexAnalyzer and queryAnalyzer

2015-04-29 Thread Kaushik
Hi Doug,

Nice explanation of the query parsers. If you get a chance, can you please
take a quick look at the issue I am facing with multi term synonyms as
well?
http://lucene.472066.n3.nabble.com/Mutli-term-synonyms-tt4200960.html#none
is the problem I am facing. I am now able to perform multi term searches on
most phrases, barring the one's which have special characters used in SOLR.
ie. [], etc.

Your help is much appreciated.

Thanks,
Kaushik

On Wed, Apr 29, 2015 at 9:24 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:

 So Solr has the idea of a query parser. The query parser is a convenient
 way of passing a search string to Solr and having Solr parse it into
 underlying Lucene queries: You can see a list of query parsers here
 http://wiki.apache.org/solr/QueryParser

 What this means is that the query parser does work to pull terms into
 individual clauses *before* analysis is run. It's a parsing layer that sits
 outside the analysis chain. This creates problems like the sea biscuit
 problem, whereby we declare sea biscuit as a query time synonym of
 seabiscuit. As you may know synonyms are checked during analysis.
 However, if the query parser splits up sea from biscuit before running
 analysis, the query time analyzer will fail. The string sea is brought by
 itself to the query time analyzer and of course won't match sea biscuit.
 Same with the string biscuit in isolation. If the full string sea
 biscuit was brought to the analyzer, it would see [sea] next to [biscuit]
 and declare it a synonym of seabiscuit. Thanks to the query parser, the
 analyzer has lost the association between the terms, and both terms aren't
 brought together to the analyzer.

 My colleague John Berryman wrote a pretty good blog post on this

 http://opensourceconnections.com/blog/2013/10/27/why-is-multi-term-synonyms-so-hard-in-solr/

 There's several solutions out there that attempt to address this problem.
 One from Ted Sullivan at Lucidworks

 https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/

 Another popular one is the hon-lucene-synonyms plugin:

 http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html

 Yet another work-around is to use the field query parser:

 http://lucene.apache.org/solr/5_1_0/solr-core/org/apache/solr/search/FieldQParserPlugin.html

 I also tend to write my own query parsers, so on the one hand its annoying
 that query parsers have the problems above, on the flipside Solr makes it
 very easy to implement whatever parsing you think is appropriatte with a
 small bit of Java/Lucene knowledge.

 Hopefully that explanation wasn't too deep, but its an important thing to
 know about Solr. Are you asking out of curiosity, or do you have a specific
 problem?

 Thanks
 -Doug

 On Wed, Apr 29, 2015 at 6:32 PM, Steven White swhite4...@gmail.com
 wrote:

  Hi Doug,
 
  I don't understand what you mean by the following:
 
   For example, if a user searches for q=hot dogsdefType=edismaxqf=title
   body the *query parser* *not* the *analyzer* first turns the query
 into:
 
  If I have indexAnalyzer and queryAnalyzer in a fieldType that are 100%
  identical, the example you provided, does it stand?  If so, why?  Or do
 you
  mean something totally different by query parser?
 
  Thanks
 
  Steve
 
 
  On Wed, Apr 29, 2015 at 4:18 PM, Doug Turnbull 
  dturnb...@opensourceconnections.com wrote:
 
   * 1) If the content of indexAnalyzer and queryAnalyzer are exactly the
   same,that's the same as if I have an analyzer only, right?*
   1) Yes
  
   *  2) Under the hood, all three are the same thing when it comes to
 what
   kind*
   *of data and configuration attributes can take, right?*
   2) Yes. Both take in text and output a token stream.
  
   *What I'm trying to figure out is this: beside being able to configure
  a*
  
   *fieldType to have different analyzer setting at index and query time,
   thereis nothing else that's unique about each.*
  
   The only thing to look out for in Solr land is the query parser. Most
  Solr
   query parsers treat whitespace as meaningful.
  
   For example, if a user searches for q=hot dogsdefType=edismaxqf=title
   body the *query parser* *not* the *analyzer* first turns the query
 into:
  
   (title:hot title:dog) | (body:hot body:dog)
  
   each word which *then *gets analyzed. This is because the query parser
   tries to be smart and turn hot dog into hot OR dog, or more
  specifically
   making them two must clauses.
  
   This trips quite a few folks up, you can use the field query parser
 which
   uses the field as a phrase query. Hope that helps
  
  
   --
   *Doug Turnbull **| *Search Relevance Consultant | OpenSource
 Connections,
   LLC | 240.476.9983 | http://www.opensourceconnections.com
   Author: Taming Search http://manning.com/turnbull from Manning
   Publications
   This e-mail and all contents, including attachments, is considered

Re: Mutli term synonyms

2015-04-29 Thread Kaushik
Hi Roman,

Tween 20 also did not retrieve me results. So I replaced the whitespaces
in the synonyms.txt with 'x' and now when I search, I get the results back.
One problem however still exits. i.e. when I search for POLYSORBATE
20[MART.] which is a synonym for POLYSORBATE 20, I get error as below,

msg: org.apache.solr.search.SyntaxError: Cannot parse 'polysORbate
20[mart.] ': Encountered \ \]\ \] \\ at line 1, column
20.\r\nWas expecting one of:\r\n\TO\ ...\r\nRANGE_QUOTED
...\r\nRANGE_GOOP ...\r\n,
code: 400

If I am able to solve this, I think I am pretty close to the solution.
Any thoughts there?

I appreciate your help on this matter.

Thank you,

Kaushik



On Wed, Apr 29, 2015 at 5:48 PM, Roman Chyla roman.ch...@gmail.com wrote:

 Hi Kaushik, I meant to compare tween 20 against tween 20.

 Your autophrase filter replaces whitespace with x, but your synonym filter
 expects whitespaces. Try that.

 Roman
 On Apr 29, 2015 2:27 PM, Kaushik kaushika...@gmail.com wrote:

  Hi Roman,
 
  When I used the debugQuery using
 
 
 http://localhost:8983/solr/collection1/autophrase?q=tween+20wt=jsonindent=truedebugQuery=true
  I see the following in the response. The autophrase plugin seems to be
  doing its part. Just not the synonym expansion. When you say use phrase
  queries, what do you mean? Please clarify.
 
  response: {
  numFound: 0,
  start: 0,
  docs: []
},
debug: {
  rawquerystring: tween 20,
  querystring: tween 20,
  parsedquery: name:tweenx20,
  parsedquery_toString: name:tweenx20,
  explain: {},
 
  Thank you,
 
  Kaushik
 
 
  On Wed, Apr 29, 2015 at 4:00 PM, Roman Chyla roman.ch...@gmail.com
  wrote:
 
   Pls post output of the request with debugQuery=true
  
   Do you see the synonyms being expanded? Probably not.
  
   You can go to the administer iface, in the analyzer section play with
 the
   input until you see the synonyms. Use phrase queries too. That will be
   helpful to elliminate autophrase filter
   On Apr 29, 2015 6:18 AM, Kaushik kaushika...@gmail.com wrote:
  
Hi Roman,
   
Following is my use case:
   
*Schema.xml*...
   
   field name=name type=text_autophrase indexed=true
   stored=true/
   
fieldType name=text_autophrase class=solr.TextField
   positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter
class=com.lucidworks.analysis.AutoPhrasingTokenFilterFactory
phrases=autophrases.txt includeTokens=false
replaceWhitespaceWith=X /
filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true
  /
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory /
filter class=solr.SynonymFilterFactory
  synonyms=synonyms.txt
ignoreCase=true expand=true /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true
  /
  /analyzer
/fieldType
   
*SolrConfig.xml...*
   
name=/autophrase class=solr.SearchHandler
   lst name=defaults
 str name=echoParamsexplicit/str
 int name=rows10/int
 str name=dfname/str
 str name=defTypeautophrasingParser/str
   /lst
  /requestHandler
   
  queryParser name=autophrasingParser
   
  class=com.lucidworks.analysis.AutoPhrasingQParserPlugin
   
str name=phrasesautophrases.txt/str
str name=replaceWhitespaceWithX/str
  /queryParser
   
   
*Synonyms.txt*
PEG-20 SORBITAN LAURATE,POLYOXYETHYLENE 20 SORBITAN MONOLAURATE,TWEEN
20,POLYSORBATE 20 [USAN],POLYSORBATE 20 [INCI],POLYSORBATE 20
[II],POLYSORBATE 20 [HSDB],TWEEN-20,PEG-20 SORBITAN,PEG-20 SORBITAN
[VANDF],POLYSORBATE-20,POLYSORBATE 20,SORETHYTAN MONOLAURATE,T-MAZ
20,POLYOXYETHYLENE (20) SORBITAN MONOLAURATE,SORBITAN
MONODODECANOATE,POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE,POLYOXYETHYLENE
SORBITAN MONOLAURATE,POLYSORBATE 20 [MART.],SORBIMACROGOL LAURATE
300,POLYSORBATE 20 [FHFI],FEMA NO. 2915,POLYSORBATE 20
  [FCC],POLYSORBATE
   20
[WHO-DD],POLYSORBATE 20 [VANDF]
   
*Autophrase.txt...*
   
Has all the above phrases in one column
   
*Indexed document*
   
doc
  field name=id31/field
  field name=namePolysorbate 20/field
  /doc
   
So when I query SOLR /autphrase for tween 20 or FEMA NO. 2915, I
 expect
   to
see the record containig Polysorbate 20. i.e.
   
   
  
 
 http://localhost:8983/solr/collection1/autophrase?q=tween+20wt=jsonindent=true
should have retrieved

Re: Mutli term synonyms

2015-04-28 Thread Kaushik
Hi there,

I tried the solution provided in
https://lucidworks.com/blog/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
.The mentioned solution works when the indexed data does not have alpha
numerics or special characters. But in  my case the synonyms are something
like the below.


 T-MAZ 20  POLYOXYETHYLENE (20) SORBITAN MONOLAURATE  SORBITAN
MONODODECANOATE  POLY(OXY-1,2-ETHANEDIYL) DERIVATIVE  POLYOXYETHYLENE
SORBITAN MONOLAURATE  POLYSORBATE 20 [MART.]  SORBIMACROGOL LAURATE
300  POLYSORBATE
20 [FHFI]  FEMA NO. 2915

They have alpha numerics, special characters, spaces, etc. Is there a way
to implment synonyms even in such case?

Thanks,
Kaushik

On Mon, Apr 20, 2015 at 11:03 AM, Davis, Daniel (NIH/NLM) [C] 
daniel.da...@nih.gov wrote:

 Handling MESH descriptor preferred terms and such is similar.   I
 encountered this during evaluation of Solr for a project here at NLM.   We
 decided to use Solr for different projects instead. I considered the
 following approaches:
  - use a custom tokenizer at index time that indexed all of the multiple
 term alternatives.
  - index the data, and then have an enrichment process that queries on
 each source synonym, and generates an update to add the target synonyms.
Follow this with an optimize.
  - During the indexing process, but before sending the data to Solr,
 process the data to tokenize and add synonyms to another field.

 Both the custom tokenizer and enrichment process share the feature that
 they use Solr's own tokenizer rather than duplicate it.   The enrichment
 process seems to me only workable in environments where you can re-index
 all data periodically, so no continuous stream of data to index that needs
 to be handled relatively quickly once it is generated.The last method
 of pre-processing the data seems the least desirable to me from a blue-sky
 perspective, but is probably the easiest to implement and the most
 independent of Solr.

 Hope this helps,

 Dan Davis, Systems/Applications Architect (Contractor),
 Office of Computer and Communications Systems,
 National Library of Medicine, NIH

 -Original Message-
 From: Kaushik [mailto:kaushika...@gmail.com]
 Sent: Monday, April 20, 2015 10:47 AM
 To: solr-user@lucene.apache.org
 Subject: Mutli term synonyms

 Hello,

 Reading up on synonyms it looks like there is no real solution for multi
 term synonyms. Is that right? I have a use case where I need to map one
 multi term phrase to another. i.e. Tween 20 needs to be translated to
 Polysorbate 40.

 Any thoughts as to how this can be achieved?

 Thanks,
 Kaushik



Order of Copy Field and Analyzer

2015-04-23 Thread Kaushik
Hello,


What is the order in which these occur?


   - Copy field
   - Analyzer

The other way of asking the above question I guess is, if I copy an _txt
field to _t field, does the analyzer of _t get the orignial text sent to
_txt field or the analyzed tokens from it?


Thanks,

Kaushik


Correct usage for Synonyms.txt

2015-04-21 Thread Kaushik
Is my understanding of synonyms.txt configuration correct
1. When the user can search from a list of synonyms and the searchable
document can have any synonym the configuration should be like below.
Fuji, Gala, Braeburn, Crisp = Fuji, Gala, Braeburn, Crisp

2. When the user can search from a list of synonyms and the searchable
document can only have a preferred term (for e.g. Apple)
Apple, Fuji, Gala, Braeburn, Crisp

OR

Fuji, Gala, Braeburn, Crisp = Apple


Is there any other format that I am missing?

Thank you,
Kaushik


Mutli term synonyms

2015-04-20 Thread Kaushik
Hello,

Reading up on synonyms it looks like there is no real solution for multi
term synonyms. Is that right? I have a use case where I need to map one
multi term phrase to another. i.e. Tween 20 needs to be translated to
Polysorbate 40.

Any thoughts as to how this can be achieved?

Thanks,
Kaushik


Re: generate uuid/ id for table which do not have any primary key

2015-04-20 Thread Kaushik
Have you tried select concatenated fields as id, name, age ?

On Thu, Apr 16, 2015 at 3:34 PM, Vishal Swaroop vishal@gmail.com
wrote:

 Just wondering if there is a way to generate uuid/ id in data-config
 without using combination of fields in query...

 data-config.xml
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
 dataSource
   batchSize=2000
   name=test
   type=JdbcDataSource
   driver=oracle.jdbc.OracleDriver
   url=jdbc:oracle:thin:@ldap:
   user=myUser
   password=pwd/
 document
 entity name=test_entity
   docRoot=true
   dataSource=test
   query=select name, age from test_user
 /entity
 /document
 /dataConfig

 On Thu, Apr 16, 2015 at 3:18 PM, Vishal Swaroop vishal@gmail.com
 wrote:

  Thanks Kaushik  Erick..
 
  Though I can populate uuid by using combination of fields but need to
  change the type to string else it throws Invalid UUID String
  field name=uuid type=string indexed=true stored=true
  required=true multiValued=false/
 
  a) I will have ~80 millions records and wondering if performance might be
  issue
  b) So, during update I can still use combination of fields i.e. uuid ?
 
  On Thu, Apr 16, 2015 at 2:44 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  This seems relevant:
 
 
 
 http://stackoverflow.com/questions/16914324/solr-4-missing-required-field-uuid
 
  Best,
  Erick
 
  On Thu, Apr 16, 2015 at 11:38 AM, Kaushik kaushika...@gmail.com
 wrote:
   You seem to have defined the field, but not populating it in the
 query.
  Use
   a combination of fields to come up with a unique id that can be
  assigned to
   uuid. Does that make sense?
  
   Kaushik
  
   On Thu, Apr 16, 2015 at 2:25 PM, Vishal Swaroop vishal@gmail.com
 
   wrote:
  
   How to generate uuid/ id (maybe in data-config.xml...) for table
 which
  do
   not have any primary key.
  
   Scenario :
   Using DIH I need to import data from database but table does not have
  any
   primary key
   I do have uuid defined in schema.xml and is
   field name=uuid type=uuid indexed=true stored=true
  required=true
   multiValued=false/
   uniqueKeyuuid/uniqueKey
  
   data-config.xml
   ?xml version=1.0 encoding=UTF-8 ?
   dataConfig
   dataSource
 batchSize=2000
 name=test
 type=JdbcDataSource
 driver=oracle.jdbc.OracleDriver
 url=jdbc:oracle:thin:@ldap:
 user=myUser
 password=pwd/
   document
   entity name=test_entity
 docRoot=true
 dataSource=test
 query=select name, age from test_user
   /entity
   /document
   /dataConfig
  
   Error : Document is missing mandatory uniqueKey field: uuid
  
 
 
 



Re: generate uuid/ id for table which do not have any primary key

2015-04-16 Thread Kaushik
You seem to have defined the field, but not populating it in the query. Use
a combination of fields to come up with a unique id that can be assigned to
uuid. Does that make sense?

Kaushik

On Thu, Apr 16, 2015 at 2:25 PM, Vishal Swaroop vishal@gmail.com
wrote:

 How to generate uuid/ id (maybe in data-config.xml...) for table which do
 not have any primary key.

 Scenario :
 Using DIH I need to import data from database but table does not have any
 primary key
 I do have uuid defined in schema.xml and is
 field name=uuid type=uuid indexed=true stored=true required=true
 multiValued=false/
 uniqueKeyuuid/uniqueKey

 data-config.xml
 ?xml version=1.0 encoding=UTF-8 ?
 dataConfig
 dataSource
   batchSize=2000
   name=test
   type=JdbcDataSource
   driver=oracle.jdbc.OracleDriver
   url=jdbc:oracle:thin:@ldap:
   user=myUser
   password=pwd/
 document
 entity name=test_entity
   docRoot=true
   dataSource=test
   query=select name, age from test_user
 /entity
 /document
 /dataConfig

 Error : Document is missing mandatory uniqueKey field: uuid



Problem with SOLR Collection creation

2014-08-28 Thread Kaushik
Hello,

We have deployed a solr.war file to a weblogic server. The web.xml has been
modified to have the path to the SOLR home as follows:
env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry

The deployment of the Solr comes up fine. In the
D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which the
conf directory with the required config files are present (solrconfig.xml,
schema.xml, etc). But when I try to add the collection to SOLR through the
admin console, I get the following error.

Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore
org.apache.solr.common.SolrException: Error CREATEing SolrCore
'RegulatoryReview': Unable to create core: RegulatoryReview Caused by:
class org.apache.solr.search.LRUCache



org.apache.solr.common.SolrException: Error CREATEing SolrCore 'RR': Unable
to create core: RRCaused by: class org.apache.solr.search.LRUCache

at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:546)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:733)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)

at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:57)

at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.wrapRun(WebAppServletContext.java:3730)

at
weblogic.servlet.internal.WebAppServletContext$ServletInvocationAction.run(WebAppServletContext.java:3696)

at
weblogic.security.acl.internal.AuthenticatedSubject.doAs(AuthenticatedSubject.java:321)

at
weblogic.security.service.SecurityManager.runAs(SecurityManager.java:120)

at
weblogic.servlet.internal.WebAppServletContext.securedExecute(WebAppServletContext.java:2273)

at
weblogic.servlet.internal.WebAppServletContext.execute(WebAppServletContext.java:2179)

at
weblogic.servlet.internal.ServletRequestImpl.run(ServletRequestImpl.java:1490)

at
weblogic.work.ExecuteThread.execute(ExecuteThread.java:256)

at weblogic.work.ExecuteThread.run(ExecuteThread.java:221)

Caused by: org.apache.solr.common.SolrException: Unable to create core: RR

at
org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:989)

at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:606)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:509)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:732)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217)

at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:56)

... 9 more

Caused by: org.apache.solr.common.SolrException: Could not load config file
D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml

at
org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:530)

at
org.apache.solr.core.CoreContainer.create(CoreContainer.java:597)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:509)

at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:152)

at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)

at
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:733)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:268)

at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:218)

at
weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:57)

... 9 more

Caused by: java.lang.ClassCastException: class
org.apache.solr.search.LRUCache

at java.lang.Class.asSubclass(Class.java:3027)

at

Re: Problem with SOLR Collection creation

2014-08-28 Thread Kaushik
The issue I was facing was that there were additonal librarires on the
classpath that were conflicting and not required. Removed those and the
problem dissapeared.

Thank you,
Kaushik


On Thu, Aug 28, 2014 at 11:50 AM, Shawn Heisey s...@elyograg.org wrote:

 On 8/28/2014 8:28 AM, Kaushik wrote:
  Hello,
 
  We have deployed a solr.war file to a weblogic server. The web.xml has
 been
  modified to have the path to the SOLR home as follows:
 
 env-entryenv-entry-namesolr/home/env-entry-nameenv-entry-typejava.lang.String/env-entry-typeenv-entry-valueD:\SOLR\4.7.0\RegulatoryReview/env-entry-value/env-entry
 
  The deployment of the Solr comes up fine. In the
  D:\SOLR\4.7.0\RegulatoryReview directory we have RR folder under which
 the
  conf directory with the required config files are present
 (solrconfig.xml,
  schema.xml, etc). But when I try to add the collection to SOLR through
 the
  admin console, I get the following error.
 
  Thursday, August 28, 2014 10:06:37 AM ERROR SolrCore
  org.apache.solr.common.SolrException: Error CREATEing SolrCore
  'RegulatoryReview': Unable to create core: RegulatoryReview Caused by:
  class org.apache.solr.search.LRUCache

 It would seem there's a problem with the cache config in your
 solrconfig.xml, or that there's some kind of problem with the Solr jars
 contained within the war.  No testing is done with weblogic, so it's
 always possible it's a class conflict with weblogic itself, but I would
 bet on a config problem first.

  The issue I believe is that it is trying to find
  D:\SOLR\4.7.0\RegulatoryReview\RR\solrconfig.xml by ignoring the conf
  directory in which it should be finding it. What am I doing wrong?

 This is SOLR-5814, a bug in the log messages, not the program logic.  I
 thought it had been fixed by 4.8, but the issue is still unresolved.

 https://issues.apache.org/jira/browse/SOLR-5814

 Thanks,
 Shawn




How to delete documents

2014-03-31 Thread Kaushik
From a database table, we have figured out a way to do the full load and
the delta loads. However, there are scenarios where some of the DB rows get
deleted. How can we have such documents deleted from SOLR indices?

Thanks,
Kaushik


Re: Faceting on multivalued field

2011-04-04 Thread Kaushik Chakraborty
Are you implying to change the DB query of the nested entity which fetches
the comments (query is in my post) or something can be done during the index
like using Transformers etc. ?

Thanks,
Kaushik


On Mon, Apr 4, 2011 at 8:07 AM, Erick Erickson erickerick...@gmail.comwrote:

 Why not count them on the way in and just store that number along
 with the original e-mail?

 Best
 Erick

 On Sun, Apr 3, 2011 at 10:10 PM, Kaushik Chakraborty kaych...@gmail.com
 wrote:

  Ok. My expectation was since comment_post_id is a MultiValued field
 hence
  it would appear multiple times (i.e. for each comment). And hence when I
  would facet with that field it would also give me the count of those many
  documents where comment_post_id appears.
 
  My requirement is getting total for every document i.e. finding number of
  comments per post in the whole corpus. To explain it more clearly, I'm
  getting a result xml something like this
 
  str name=post_id46/str
  str name=post_textHello World/str
  str name=person_id20/str
  arr name=comment_id
 str9/str
 str10/str
  /arr
  arr name=comment_person_id
str19/str
str2/str
  /arr
  arr name=comment_post_id
   str46/str
   str46/str
  /arr
  arr name=comment_text
strHello - from World/str
strHi/str
  /arr
 
  lst name=facet_fields
   lst name=comment_post_id
  *int name=461/int*
 
  I need the count to be 2 as the post 46 has 2 comments.
 
   What other way can I approach?
 
  Thanks,
  Kaushik
 
 
  On Mon, Apr 4, 2011 at 4:29 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Hmmm, I think you're misunderstanding faceting. It's counting the
   number of documents that have a particular value. So if you're
   faceting on comment_post_id, there is one and only one document
   with that value (assuming that the comment_post_ids are unique).
   Which is what's being reported This will be quite expensive on a
   large corpus, BTW.
  
   Is your task to show the totals for *every* document in your corpus or
   just the ones in a display page? Because if the latter, your app could
   just count up the number of elements in the XML returned for the
   multiValued comments field.
  
   If that's not relevant, could you explain a bit more why you need this
   count?
  
   Best
   Erick
  
   On Sun, Apr 3, 2011 at 2:31 PM, Kaushik Chakraborty 
 kaych...@gmail.com
   wrote:
  
Hi,
   
My index contains a root entity Post and a child entity Comments.
   Each
post can have multiple comments. data-config.xml:
   
document
   entity name=posts transformer=TemplateTransformer
dataSource=jdbc query=
   
   field column=post_id /
   field column=post_text/
   field column=person_id/
   entity name=comments dataSource=jdbc
 query=select
  *
from comments where post_id = ${posts.post_id} 
   field column=comment_id /
   field column=comment_text /
   field column=comment_person_id /
   field column=comment_post_id /
  /entity
   /entity
/document
   
The schema has all columns of comment entity as MultiValued
 fields
   and
all fields are indexed  stored. My requirement is to count the
 number
  of
comments for each post. Approach I'm taking is to query on *:* and
faceting the result on comment_post_id so that it gives the count
 of
comment occurred for that post.
   
But I'm getting incorrect result e.g. if a post has 2 comments, the
multivalued fields are populated alright but the facet count is
 coming
  as
   1
(for that post_id). What else do I need to do?
   
   
Thanks,
Kaushik
   
  
 



Faceting on multivalued field

2011-04-03 Thread Kaushik Chakraborty
Hi,

My index contains a root entity Post and a child entity Comments. Each
post can have multiple comments. data-config.xml:

document
entity name=posts transformer=TemplateTransformer
dataSource=jdbc query=

field column=post_id /
field column=post_text/
field column=person_id/
entity name=comments dataSource=jdbc query=select *
from comments where post_id = ${posts.post_id} 
field column=comment_id /
field column=comment_text /
field column=comment_person_id /
field column=comment_post_id /
   /entity
/entity
/document

The schema has all columns of comment entity as MultiValued fields and
all fields are indexed  stored. My requirement is to count the number of
comments for each post. Approach I'm taking is to query on *:* and
faceting the result on comment_post_id so that it gives the count of
comment occurred for that post.

But I'm getting incorrect result e.g. if a post has 2 comments, the
multivalued fields are populated alright but the facet count is coming as 1
(for that post_id). What else do I need to do?


Thanks,
Kaushik


Re: SOLR DIH importing MySQL text column as a BLOB

2011-03-16 Thread Kaushik Chakraborty
The query's there in the data-config.xml. And the query's fetching as
expected from the database.

Thanks,
Kaushik


On Wed, Mar 16, 2011 at 9:21 PM, Gora Mohanty g...@mimirtech.com wrote:

 On Wed, Mar 16, 2011 at 2:29 PM, Stefan Matheis
 matheis.ste...@googlemail.com wrote:
  Kaushik,
 
  i just remembered an ML-Post few weeks ago .. same problem while
  importing geo-data
  (
 http://lucene.472066.n3.nabble.com/Solr-4-0-Spatial-Search-How-to-tp2245592p2254395.html
 )
  - the solution was:
 
  CAST( CONCAT( lat, ',', lng ) AS CHAR )
 
  at that time i search a little bit for the reason and afaik there was
  a bug in mysql/jdbc which produces that binary output under certain
  conditions
 [...]

 As Stefan mentions, there might be a way to solve this.

 Could you show us the query in DIH that you are using
 when you get this BLOB, i.e., the SELECT statement
 that goes to the database?

 It might also be instructive for you to try that same
 SELECT directly in a mysql interface.

 Regards,
 Gora



SOLR DIH importing MySQL text column as a BLOB

2011-03-15 Thread Kaushik Chakraborty
I've a column for posts in MySQL of type `text`, I've tried corresponding
`field-type` for it in Solr `schema.xml` e.g. `string, text, text-ws`. But
whenever I'm importing it using the DIH, it's getting imported as a BLOB
object. I checked, this thing is happening only for columns of type `text`
and not for `varchar`(they are getting indexed as string). Hence, the posts
field is not becoming searchable.

I found about this issue, after repeated search failures, when I did a `*:*`
query search on Solr. A sample response:

result name=response numFound=223 start=0 maxScore=1.0
doc
float name=score1.0/float
str name=solr_post_bio[B@10a33ce2/str
date name=solr_post_created_at2011-02-21T07:02:55Z/date
str name=solr_post_emailtest.acco...@gmail.com/str
str name=solr_post_first_nameTest/str
str name=solr_post_last_nameAccount/str
str name=solr_post_message[B@2c93c4f1/str
str name=solr_post_status_message_id1/str
/doc

The `data-config.xml` :

document
 entity name=posts dataSource=jdbc  query=select
 p.person_id as solr_post_person_id,
 pr.first_name as solr_post_first_name,
 pr.last_name as solr_post_last_name,
 u.email as solr_post_email,
 p.message as solr_post_message,
 p.id as solr_post_status_message_id,
 p.created_at as solr_post_created_at,
 pr.bio as solr_post_bio
 from posts p,users u,profiles pr where p.person_id = u.id and
p.person_id = pr.person_id and p.type='StatusMessage'
 field column=solr_post_person_id /
 field column=solr_post_first_name/
 field column=solr_post_last_name /
 field column=solr_post_email /
 field column=solr_post_message /
 field column=solr_post_status_message_id /
 field column=solr_post_created_at /
 field column=solr_post_bio/
   /entity
  /document

The `schema.xml` :

fields
field name=solr_post_status_message_id type=string
indexed=true stored=true required=true /
 field name=solr_post_message type=text_ws indexed=true
stored=true required=true /
 field name=solr_post_bio type=text indexed=false stored=true
/
 field name=solr_post_first_name type=string indexed=false
stored=true /
 field name=solr_post_last_name type=string indexed=false
stored=true /
 field name=solr_post_email type=string indexed=false
stored=true /
 field name=solr_post_created_at type=date indexed=false
stored=true /
/fields
uniqueKeysolr_post_status_message_id/uniqueKey
defaultSearchFieldsolr_post_message/defaultSearchField


Thanks,
Kaushik