Re: [EXTERNAL] Re: How to merge child documents using DataImportHandler

2018-05-27 Thread David M Giannone
?



Sent via the Samsung Galaxy S® 6, an AT 4G LTE smartphone


 Original message 
From: Mikhail Khludnev 
Date: 5/27/18 3:23 PM (GMT-05:00)
To: solr-user 
Subject: [EXTERNAL] Re: How to merge child documents using DataImportHandler

Hello, Abhijit.
Have you tried to drop some of child=true? They usually cause slicing to
separate documents, rather than default "merge to root" mode.

On Sun, May 27, 2018 at 9:48 PM, Abhijit Pawar 
wrote:

> ​​
> Hello,
>
> I am using DataImportHandler to index data from mongoDB.
>
> Here's how my data-source-config file looks like:
>
> 
>  driver="com.mongodb.jdbc.MongoDriver" url="mongodb://< Address>>:27017/<>"/>
> 
> entityA(Root Entity) -   *products*
>entityB (child=true,pk=unique field) - *skus*
>   entityC - *attributevalues*
>  entityD - *attributenames*
> entityE(child=true,pk=unique field) - *skupricelist*
>
>
> When data is indexed separate  *skupricelist* documents are created for
> each attribute (since *skupricelist* is child of *skus* and under
> *attributenames*).How can I merge / join the all those skupricelist
> documents with all attributes in same document?
>
> example :
> Right now the documents created are as follows:
>
> Separate document 1
> {
> 'PRODUCT NAME':'ABC',
> 'SKU NAME':'ABC-1',
> 'Color':'Red',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Separate document 2
> {
> 'PRODUCT NAME':'ABC',
> 'SKU':'ABC-1',
> 'Size':'10',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Separate document 3
> {
> 'PRODUCT NAME':'ABC',
> 'SKU':'ABC-1',
> 'Type':'Leather',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Is there a way I can join them like this?
>
> {
> 'PRODUCT NAME':'ABC',
> 'SKU':'ABC-1',
> 'Color':'Red',
> 'Size':'10',
> 'Type':'Leather',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Thank You.
> Regards,
>
> Abhijit
>



--
Sincerely yours
Mikhail Khludnev


Nothing in this message is intended to constitute an electronic signature 
unless a specific statement to the contrary is included in this message.

Confidentiality Note: This message is intended only for the person or entity to 
which it is addressed. It may contain confidential and/or privileged material. 
Any review, transmission, dissemination or other use, or taking of any action 
in reliance upon this message by persons or entities other than the intended 
recipient is prohibited and may be unlawful. If you received this message in 
error, please contact the sender and delete it from your computer.


Re: Slower queries with 7.3.1?

2018-05-27 Thread Will Currie
Thanks Deepak. I think I understand the cause of the slowdown. There are
some flamegraphs (from stack sampling) on SOLR-12407. I also captured some
traces using yourkit.

On Sun, May 27, 2018 at 1:21 PM, Deepak Goel  wrote:

> Is it possible to profile the code to find the exact points which are
> taking more time comparatively?
>
> On Sun, 27 May 2018, 06:02 Will Currie,  wrote:
>
> > I raised https://issues.apache.org/jira/browse/SOLR-12407. In case
> anybody
> > else sees a similar slowdown with boosts.
> >
> > On Sat, May 26, 2018 at 4:10 PM, Will Currie  wrote:
> >
> > > I did some more (micro)benchmarking with a single query. Setting the
> > query
> > > cache size to zero I see 400ms response time on 7.2 and 600ms on 7.3.
> > > Running curl in a loop on my laptop. ~4M docs. ~3G index. 1M total hits
> > > for the query.. Yup. I'm reluctant to post the query. It has multiple
> > 300+
> > > character streams of if,product,map calls in multiple boost parameters.
> > >
> > > I realise my query is likely ridiculous (inefficient, better done
> another
> > > way, etc) but LUCENE-8099 mentions:
> > > "Re performance: there shouldn't be any reason for things to be slower
> > ...
> > > It might be useful to add some examples of these queries to the
> benchmark
> > > tests though."
> > >
> > > Maybe I have such a benchmark.. Grasping at straws guess, I noticed 7.2
> > > sticks with floats. 7.3 does a few frames of math with doubles before
> > > returning to floats.
> > >
> > > jstack from 7.2:
> > >
> > > "qtp2136344592-24" #24 prio=5 os_prio=31 tid=0x7f80630e5000
> > nid=0x7103
> > > runnable [0x749bb000]
> > >java.lang.Thread.State: RUNNABLE
> > > at org.apache.lucene.queries.function.valuesource.
> > > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > > at org.apache.lucene.queries.function.valuesource.
> > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > > at org.apache.lucene.queries.function.valuesource.
> IfFunction$1.floatVal(
> > > IfFunction.java:64)
> > > at org.apache.lucene.queries.function.valuesource.
> > > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > > at org.apache.lucene.queries.function.valuesource.
> > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > > at org.apache.lucene.queries.function.valuesource.
> IfFunction$1.floatVal(
> > > IfFunction.java:64)
> > > at org.apache.lucene.queries.function.valuesource.
> IfFunction$1.floatVal(
> > > IfFunction.java:64)
> > > at org.apache.lucene.queries.function.valuesource.
> > > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > > at org.apache.lucene.queries.function.valuesource.
> > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > > * at
> > >
> > org.apache.lucene.queries.function.BoostedQuery$CustomScorer.score(
> BoostedQuery.java:124)*
> > > at org.apache.lucene.search.TopScoreDocCollector$
> > > SimpleTopScoreDocCollector$1.collect(TopScoreDocCollector.java:64)
> > > at org.apache.lucene.search.Weight$DefaultBulkScorer.
> > > scoreAll(Weight.java:233)
> > > at org.apache.lucene.search.Weight$DefaultBulkScorer.
> > > score(Weight.java:184)
> > > at org.apache.lucene.search.BulkScorer.score(BulkScorer.java:39)
> > > at org.apache.lucene.search.IndexSearcher.search(
> IndexSearcher.java:660)
> > > at org.apache.lucene.search.IndexSearcher.search(
> IndexSearcher.java:462)
> > > at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(
> > > SolrIndexSearcher.java:215)
> > >
> > > jstack from 7.3.1:
> > >
> > > "qtp559670971-25" #25 prio=5 os_prio=31 tid=0x7fe23fa0c000
> nid=0x7303
> > > runnable [0x7b024000]
> > >java.lang.Thread.State: RUNNABLE
> > > at org.apache.lucene.queries.function.valuesource.
> IfFunction$1.floatVal(
> > > IfFunction.java:64)
> > > at org.apache.lucene.queries.function.valuesource.
> > > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > > at org.apache.lucene.queries.function.valuesource.
> > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > > at org.apache.lucene.queries.function.valuesource.
> IfFunction$1.floatVal(
> > > IfFunction.java:64)
> > > at org.apache.lucene.queries.function.valuesource.
> > > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > > at org.apache.lucene.queries.function.valuesource.
> > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > > at org.apache.lucene.queries.function.valuesource.
> IfFunction$1.floatVal(
> > > IfFunction.java:64)
> > > at org.apache.lucene.queries.function.valuesource.
> IfFunction$1.floatVal(
> > > IfFunction.java:64)
> > > at org.apache.lucene.queries.function.valuesource.
> > > ProductFloatFunction.func(ProductFloatFunction.java:41)
> > > at org.apache.lucene.queries.function.valuesource.
> > > MultiFloatFunction$1.floatVal(MultiFloatFunction.java:82)
> > > * at
> > >
> > org.apache.lucene.queries.function.docvalues.FloatDocValues.doubleVal(
> 

Re: How to merge child documents using DataImportHandler

2018-05-27 Thread Abhijit Pawar
Hi Mikhail,

Yes I already tried that dropping child=true for skupricelist document.
However then it does not index data from that collection at all.

I need it as I am inheriting some properties from skus collection and some
from attributevalues and attributenames collection.
Also here data from  skus, attributevalues and  attributenames collecitions
is already merged under same document.
However data from skupricelist data is split into separate documents for
every attribute.

Regards,

Abhijit


On Sun, May 27, 2018 at 2:24 PM Mikhail Khludnev  wrote:

> Hello, Abhijit.
> Have you tried to drop some of child=true? They usually cause slicing to
> separate documents, rather than default "merge to root" mode.
>
> On Sun, May 27, 2018 at 9:48 PM, Abhijit Pawar  >
> wrote:
>
> > ​​
> > Hello,
> >
> > I am using DataImportHandler to index data from mongoDB.
> >
> > Here's how my data-source-config file looks like:
> >
> > 
> >  > driver="com.mongodb.jdbc.MongoDriver" url="mongodb://< > Address>>:27017/<>"/>
> > 
> > entityA(Root Entity) -   *products*
> >entityB (child=true,pk=unique field) - *skus*
> >   entityC - *attributevalues*
> >  entityD - *attributenames*
> > entityE(child=true,pk=unique field) - *skupricelist*
> >
> >
> > When data is indexed separate  *skupricelist* documents are created for
> > each attribute (since *skupricelist* is child of *skus* and under
> > *attributenames*).How can I merge / join the all those skupricelist
> > documents with all attributes in same document?
> >
> > example :
> > Right now the documents created are as follows:
> >
> > Separate document 1
> > {
> > 'PRODUCT NAME':'ABC',
> > 'SKU NAME':'ABC-1',
> > 'Color':'Red',
> > 'SKUPricelist':'SKUPricelistA'
> > }
> >
> > Separate document 2
> > {
> > 'PRODUCT NAME':'ABC',
> > 'SKU':'ABC-1',
> > 'Size':'10',
> > 'SKUPricelist':'SKUPricelistA'
> > }
> >
> > Separate document 3
> > {
> > 'PRODUCT NAME':'ABC',
> > 'SKU':'ABC-1',
> > 'Type':'Leather',
> > 'SKUPricelist':'SKUPricelistA'
> > }
> >
> > Is there a way I can join them like this?
> >
> > {
> > 'PRODUCT NAME':'ABC',
> > 'SKU':'ABC-1',
> > 'Color':'Red',
> > 'Size':'10',
> > 'Type':'Leather',
> > 'SKUPricelist':'SKUPricelistA'
> > }
> >
> > Thank You.
> > Regards,
> >
> > Abhijit
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
>


Re: delta-update alternative on filechanges when using FileListEntityProcessor

2018-05-27 Thread Alexandre Rafalovitch
The best practice is not to use DIH in production. It is great for several
rounds of prototyping but then things get messy and uneven as you found.
The delete logic is always extra messy.

So, this may be a good point to switch to an external client and implement
the monitoring logic there.

Regards,
Alex
P.s. or you could reindex everything periodically in a separate collection
and swap it into production. No delete logic.

On Sun, May 27, 2018, 2:48 PM Thomas Lustig,  wrote:

> I configured a DataImportHandler using a FileListEntityProcessor to import
> files from a folder.
> This setup works really great, but i do not now how i should handle changes
> on the filesystem (e.g. files added, deleted,...)
> Should I always do a "full-import"? As far as i read "delta-import" is only
> supported by SqlEntityProcessor.
> Is there a best practise, that is recommended?
> Thanks in advance for helping me
>
> Br
> Tom
>


Re: How to merge child documents using DataImportHandler

2018-05-27 Thread Mikhail Khludnev
Hello, Abhijit.
Have you tried to drop some of child=true? They usually cause slicing to
separate documents, rather than default "merge to root" mode.

On Sun, May 27, 2018 at 9:48 PM, Abhijit Pawar 
wrote:

> ​​
> Hello,
>
> I am using DataImportHandler to index data from mongoDB.
>
> Here's how my data-source-config file looks like:
>
> 
>  driver="com.mongodb.jdbc.MongoDriver" url="mongodb://< Address>>:27017/<>"/>
> 
> entityA(Root Entity) -   *products*
>entityB (child=true,pk=unique field) - *skus*
>   entityC - *attributevalues*
>  entityD - *attributenames*
> entityE(child=true,pk=unique field) - *skupricelist*
>
>
> When data is indexed separate  *skupricelist* documents are created for
> each attribute (since *skupricelist* is child of *skus* and under
> *attributenames*).How can I merge / join the all those skupricelist
> documents with all attributes in same document?
>
> example :
> Right now the documents created are as follows:
>
> Separate document 1
> {
> 'PRODUCT NAME':'ABC',
> 'SKU NAME':'ABC-1',
> 'Color':'Red',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Separate document 2
> {
> 'PRODUCT NAME':'ABC',
> 'SKU':'ABC-1',
> 'Size':'10',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Separate document 3
> {
> 'PRODUCT NAME':'ABC',
> 'SKU':'ABC-1',
> 'Type':'Leather',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Is there a way I can join them like this?
>
> {
> 'PRODUCT NAME':'ABC',
> 'SKU':'ABC-1',
> 'Color':'Red',
> 'Size':'10',
> 'Type':'Leather',
> 'SKUPricelist':'SKUPricelistA'
> }
>
> Thank You.
> Regards,
>
> Abhijit
>



-- 
Sincerely yours
Mikhail Khludnev


How to merge child documents using DataImportHandler

2018-05-27 Thread Abhijit Pawar
​​
Hello,

I am using DataImportHandler to index data from mongoDB.

Here's how my data-source-config file looks like:




entityA(Root Entity) -   *products*
   entityB (child=true,pk=unique field) - *skus*
  entityC - *attributevalues*
 entityD - *attributenames*
entityE(child=true,pk=unique field) - *skupricelist*


When data is indexed separate  *skupricelist* documents are created for
each attribute (since *skupricelist* is child of *skus* and under
*attributenames*).How can I merge / join the all those skupricelist
documents with all attributes in same document?

example :
Right now the documents created are as follows:

Separate document 1
{
'PRODUCT NAME':'ABC',
'SKU NAME':'ABC-1',
'Color':'Red',
'SKUPricelist':'SKUPricelistA'
}

Separate document 2
{
'PRODUCT NAME':'ABC',
'SKU':'ABC-1',
'Size':'10',
'SKUPricelist':'SKUPricelistA'
}

Separate document 3
{
'PRODUCT NAME':'ABC',
'SKU':'ABC-1',
'Type':'Leather',
'SKUPricelist':'SKUPricelistA'
}

Is there a way I can join them like this?

{
'PRODUCT NAME':'ABC',
'SKU':'ABC-1',
'Color':'Red',
'Size':'10',
'Type':'Leather',
'SKUPricelist':'SKUPricelistA'
}

Thank You.
Regards,

Abhijit


delta-update alternative on filechanges when using FileListEntityProcessor

2018-05-27 Thread Thomas Lustig
I configured a DataImportHandler using a FileListEntityProcessor to import
files from a folder.
This setup works really great, but i do not now how i should handle changes
on the filesystem (e.g. files added, deleted,...)
Should I always do a "full-import"? As far as i read "delta-import" is only
supported by SqlEntityProcessor.
Is there a best practise, that is recommended?
Thanks in advance for helping me

Br
Tom


Weird behavioural differences between pf in dismax and edismax

2018-05-27 Thread Sambhav Kothari
Hello,

I experienced a weird behaviour with dismax and edismax query parsers.
Dismax will include pf boosts when we query something that has just a
single word, edismax on the other hand will not include pf boosts.

The result is that a dismax and an edismax handler with the same set of
defaults, return different results for single word queries (eg. "Hello")
but the same results for multi word queries (eg. "Hello Wold")

Is this expected?

Regards,
Sam