DIH import fails when importing multi-valued field

2019-06-26 Thread Robert Dadzie
Hi All,


I'm trygin to use DIH to import about 150k documents to Solr. One of the 
multi-valued fields I need to import stores about 1500 unique ID per record. I 
tried increasing the 'ramBufferSizeMB' setting but that didn't help. I get this 
ArrayIndexOutOfBoundsException error and I can't make any sense of it. An 
extract of the error log is below, any assistance would be greatly appreciated.


o.a.s.h.d.DocBuilder Exception while processing: matter document : 
SolrInputDocument(fields: 
[]):org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to 
execute query: SELECT * FROM myView WITH (NOLOCK)  ORDER BY viewId Processing 
Document # 1
   at 
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:69)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:327)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource.createResultSetIterator(JdbcDataSource.java:288)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:283)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource.getData(JdbcDataSource.java:52)
   at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.initQuery(SqlEntityProcessor.java:59)
   at 
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
   at 
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:267)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:476)
   at 
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:415)
   at 
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:330)
   at 
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:233)
   at 
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:424)
   at 
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:483)
   at 
org.apache.solr.handler.dataimport.DataImporter.lambda$runAsync$0(DataImporter.java:466)
   at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException
   at java.net.SocketInputStream.socketRead0(Native Method)
   at 
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
   at java.net.SocketInputStream.read(SocketInputStream.java:170)
   at java.net.SocketInputStream.read(SocketInputStream.java:141)
   at com.jnetdirect.jsql.DBComms.receive(DBComms.java:777)
   at com.jnetdirect.jsql.IOBuffer.sendCommand(IOBuffer.java:248)
   at 
com.jnetdirect.jsql.JSQLStatement.sendExecute(JSQLStatement.java:2478)
   at 
com.jnetdirect.jsql.JSQLStatement.doExecute(JSQLStatement.java:2447)
   at 
com.jnetdirect.jsql.JSQLStatement.execute(JSQLStatement.java:2433)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.executeStatement(JdbcDataSource.java:349)
   at 
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.(JdbcDataSource.java:321)
   ... 14 more

Thanks,
Robert Dadzie




If you wish to view the CPA Global group email disclaimer, please click 
here



Is it possible to documents using fields from multiple entities?

2019-06-26 Thread Robert Dadzie
I have 2 entities and I need to devise a query that can filter documents using 
fields from both entities, is this possible, if so can you provide a sample 
query for how to do so?

Thanks,
Robert Dadzie




If you wish to view the CPA Global group email disclaimer, please click 
here



Re: SolrInputDocument setField method

2019-06-26 Thread Mark Sholund
I noticed this yesterday as well. The toString() and jsonStr() (in later 
versions) of SolrJ both include things like

toString(): 
{id=id=[foo123](https://www.nga.mil/careers/studentopp/Pages/default.aspx), ...}
or
jsonStr(): 
{"id":"id=[foo123](https://www.nga.mil/careers/studentopp/Pages/default.aspx)",...}

However Solr does not reject the documents so this must just be an issue with 
the two methods.

On Wed, Jun 26, 2019 at 12:31 PM, Samuel Kasimalla  wrote:

> Hi Vicenzo,
>
> May be looking at the overridden toString() would give you a clue.
>
> The second part, I don't think SolrJ holds it it twice(if you are worried
> about redundant usage of memory), BUT if you haven't used SolrJ so far and
> wanted to know if this is the format in which it pushes to Solr, I'm pretty
> sure it doesn't push this format into Solr.
>
> Thanks,
> Sam
> https://www.linkedin.com/in/skasimalla
>
> On Wed, Jun 26, 2019 at 11:52 AM Vincenzo D'Amore 
> wrote:
>
>> Hi all,
>>
>> I have a very basic question related to the SolrInputDocument behaviour.
>>
>> Looking at SolrInputDocument source code I found how the method setField
>> works:
>>
>> public void setField(String name, Object value )
>> {
>> SolrInputField field = new SolrInputField( name );
>> _fields.put( name, field );
>> field.setValue( value );
>> }
>>
>> The field name is "duplicated" into the SolrInputField.
>>
>> For example, if I'm storing a field "color" with value "red" what we have
>> is a Map like this:
>>
>> { "key" : "color", "value" : { "name" : "color", "value" : "red" } }
>>
>> the name field "color" appears twice. Very likely there is a reason for
>> this, could you please point me in the right direction?
>>
>> For example, I'm worried about at what happens with SolrJ when I'm sending
>> a lot of documents, where for each field the fieldName is sent twice.
>>
>> Thanks,
>> Vincenzo
>>
>>
>> --
>> Vincenzo D'Amore
>>-BEGIN PGP PUBLIC KEY BLOCK-
Version: Pmcrypto Golang 0.0.1 (ddacebe0)
Comment: https://protonmail.com

xjMEXMJGxxYJKwYBBAHaRw8BAQdAbwlnObuOIUWLq2qqb+MFiIqxKvGaHeKEk/k/
7Eh5SUjNPyJtYXJrLmQuc2hvbHVuZEBwcm90b25tYWlsLmNvbSIgPG1hcmsuZC5z
aG9sdW5kQHByb3Rvbm1haWwuY29tPsJ3BBAWCgAfBQJcwkbHBgsJBwgDAgQVCAoC
AxYCAQIZAQIbAwIeAQAKCRB2Mb5icFoL0j/8AP9tDyF3ziA4+0zM93ZTD8FuffX0
6mAIbnW/EmXujHZLDQEA3ALWhh1hjlQpm2ruuF1+dlsngebhd1AO93xMsYhGkwPO
OARcwkbHEgorBgEEAZdVAQUBAQdAoA4U5UGvfPMnqvmLKkRdcvyL5tgFAkoSqSnJ
QWFauykDAQgHwmEEGBYIAAkFAlzCRscCGwwACgkQdjG+YnBaC9K9XwD+NyBcSQqc
pUop1n12B+VA/ZKRMNiz8LQusBUEEr9XAr4A/im3m0KIJGHSwgBTNzSuZreg5n6U
DLlTkt3B58b1z3wP
=BNNh
-END PGP PUBLIC KEY BLOCK-

Re: Large Filter Query

2019-06-26 Thread Lucky Sharma
Thanks, David, Shawn, Jagdish

Help and suggestions are really appreciated.

Regards,
Lucky Sharma

On Thu, Jun 27, 2019 at 12:50 AM Shawn Heisey  wrote:
>
> On 6/26/2019 12:56 PM, Lucky Sharma wrote:
> > @Shawn: Sorry I forgot to mention the corpus size: the corpus size is
> > around 3 million docs, where we need to query for 1500 docs and run
> > aggregations, sorting, search on them.
>
> Assuming the documents aren't HUGE, that sounds like something Solr
> should be able to handle pretty easily on a typical modern 64-bit
> system.  I handled multiple indexes much larger than that with an 8GB
> heap on Linux servers with 64GB total memory.  Most likely you won't
> need anything that large.
>
> Depending on exactly what you're going to do with it, that's probably
> also something easily handled by a relational database or a more modern
> NoSQL solution ... especially if "traditional search" is not part of
> your goals.  Solr can do things beyond search, but search is where
> everything is optimized, so if search is not part of your goal, you
> might want to look elsewhere.
>
> > @David: But will that not be a performance hit (resource incentive)?
> > since it will have that many terms to search upon, the query parse
> > tree will be big, isn't it?
>
> The terms query parser is far more efficient than a simple boolean "OR"
> search with the same number of terms.  It is highly recommended for use
> cases like you have described.
>
> The default maxBooleanClauses limit that Lucene enforces on boolean
> queries is 1024 ... but this is an arbitrary value.  The limit was
> designed as a way to prevent massive queries from running when it wasn't
> truly intended for such queries to have been created in the first place.
>   It is common for users to increase the default limit.
>
> You're probably going to want to send your queries as POST requests,
> because those have a 2MB default body-size restriction, which can be
> increased.  GET requests are limited by the HTTP header size
> restriction, which defaults 8192 bytes on all web server implementations
> I have checked, including the one that's included with Solr.  Increasing
> that is possible, but not recommended ... especially to the sizes you
> would need for the queries you have described.
>
> Thanks,
> Shawn



-- 
Warm Regards,

Lucky Sharma
Contact No :+91 9821559918


Re: Large Filter Query

2019-06-26 Thread Shawn Heisey

On 6/26/2019 12:56 PM, Lucky Sharma wrote:

@Shawn: Sorry I forgot to mention the corpus size: the corpus size is
around 3 million docs, where we need to query for 1500 docs and run
aggregations, sorting, search on them.


Assuming the documents aren't HUGE, that sounds like something Solr 
should be able to handle pretty easily on a typical modern 64-bit 
system.  I handled multiple indexes much larger than that with an 8GB 
heap on Linux servers with 64GB total memory.  Most likely you won't 
need anything that large.


Depending on exactly what you're going to do with it, that's probably 
also something easily handled by a relational database or a more modern 
NoSQL solution ... especially if "traditional search" is not part of 
your goals.  Solr can do things beyond search, but search is where 
everything is optimized, so if search is not part of your goal, you 
might want to look elsewhere.



@David: But will that not be a performance hit (resource incentive)?
since it will have that many terms to search upon, the query parse
tree will be big, isn't it?


The terms query parser is far more efficient than a simple boolean "OR" 
search with the same number of terms.  It is highly recommended for use 
cases like you have described.


The default maxBooleanClauses limit that Lucene enforces on boolean 
queries is 1024 ... but this is an arbitrary value.  The limit was 
designed as a way to prevent massive queries from running when it wasn't 
truly intended for such queries to have been created in the first place. 
 It is common for users to increase the default limit.


You're probably going to want to send your queries as POST requests, 
because those have a 2MB default body-size restriction, which can be 
increased.  GET requests are limited by the HTTP header size 
restriction, which defaults 8192 bytes on all web server implementations 
I have checked, including the one that's included with Solr.  Increasing 
that is possible, but not recommended ... especially to the sizes you 
would need for the queries you have described.


Thanks,
Shawn


Re: Large Filter Query

2019-06-26 Thread jai dutt
Then term query parser is best way to do that.
You can check below link  for performance detail.

http://yonik.com/solr-terms-query/

n Thu, 27 Jun, 2019, 12:31 AM Lucky Sharma,  wrote:

> Thanks, Jagdish
> But what if we need to perform search and filtering on those 1.5k doc
> ids results, also for URI error, we can go with the POST approach,
> and what if the data is not sharded.
>
> Regards,
> Lucky Sharma
>
> On Thu, Jun 27, 2019 at 12:28 AM jai dutt 
> wrote:
> >
> > 1. No Solr is not for id search.  rdms a better option.
> > 2. Yes correct it going to impact query  performance. And you may got
> > large uri error.
> > 3 ya you can pass ids internally by writing any custom parser.or divide
> > data into different shard.
> >
> >
> >
> > On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:
> >
> > > Hi all,
> > >
> > > What we are doing is, we will be having a set of unique Ids of solr
> > > document at max 1500, we need to run faceting and sorting among them.
> > > there is no direct search involved.
> > > It's a head-on search since we already know the document unique keys
> > > beforehand.
> > >
> > > 1. Is Solr a better use case for such kind of problem?
> > > 2. Since we will be passing 1500 unique document ids, As per my
> > > understanding it will impact query tree as it will grow bigger. Will
> > > there be any other impacts?
> > > 3. Is it wise to use or solve the situation in this way?
> > >
> > >
> > > --
> > > Warm Regards,
> > >
> > > Lucky Sharma
> > >
>
>
>
> --
> Warm Regards,
>
> Lucky Sharma
> Contact No :+91 9821559918
>


Re: Large Filter Query

2019-06-26 Thread David Hastings
yeah there is a performance hit but that is expected.  in my scenario i
pass sometimes a few thousand using this method, but i pre-process my
results since its a set.  you will not have any issues if you are using
POST with the uri length.

On Wed, Jun 26, 2019 at 3:02 PM Lucky Sharma  wrote:

> Thanks, Jagdish
> But what if we need to perform search and filtering on those 1.5k doc
> ids results, also for URI error, we can go with the POST approach,
> and what if the data is not sharded.
>
> Regards,
> Lucky Sharma
>
> On Thu, Jun 27, 2019 at 12:28 AM jai dutt 
> wrote:
> >
> > 1. No Solr is not for id search.  rdms a better option.
> > 2. Yes correct it going to impact query  performance. And you may got
> > large uri error.
> > 3 ya you can pass ids internally by writing any custom parser.or divide
> > data into different shard.
> >
> >
> >
> > On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:
> >
> > > Hi all,
> > >
> > > What we are doing is, we will be having a set of unique Ids of solr
> > > document at max 1500, we need to run faceting and sorting among them.
> > > there is no direct search involved.
> > > It's a head-on search since we already know the document unique keys
> > > beforehand.
> > >
> > > 1. Is Solr a better use case for such kind of problem?
> > > 2. Since we will be passing 1500 unique document ids, As per my
> > > understanding it will impact query tree as it will grow bigger. Will
> > > there be any other impacts?
> > > 3. Is it wise to use or solve the situation in this way?
> > >
> > >
> > > --
> > > Warm Regards,
> > >
> > > Lucky Sharma
> > >
>
>
>
> --
> Warm Regards,
>
> Lucky Sharma
> Contact No :+91 9821559918
>


Re: Large Filter Query

2019-06-26 Thread Lucky Sharma
Thanks, Jagdish
But what if we need to perform search and filtering on those 1.5k doc
ids results, also for URI error, we can go with the POST approach,
and what if the data is not sharded.

Regards,
Lucky Sharma

On Thu, Jun 27, 2019 at 12:28 AM jai dutt  wrote:
>
> 1. No Solr is not for id search.  rdms a better option.
> 2. Yes correct it going to impact query  performance. And you may got
> large uri error.
> 3 ya you can pass ids internally by writing any custom parser.or divide
> data into different shard.
>
>
>
> On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:
>
> > Hi all,
> >
> > What we are doing is, we will be having a set of unique Ids of solr
> > document at max 1500, we need to run faceting and sorting among them.
> > there is no direct search involved.
> > It's a head-on search since we already know the document unique keys
> > beforehand.
> >
> > 1. Is Solr a better use case for such kind of problem?
> > 2. Since we will be passing 1500 unique document ids, As per my
> > understanding it will impact query tree as it will grow bigger. Will
> > there be any other impacts?
> > 3. Is it wise to use or solve the situation in this way?
> >
> >
> > --
> > Warm Regards,
> >
> > Lucky Sharma
> >



-- 
Warm Regards,

Lucky Sharma
Contact No :+91 9821559918


Re: Large Filter Query

2019-06-26 Thread jai dutt
1. No Solr is not for id search.  rdms a better option.
2. Yes correct it going to impact query  performance. And you may got
large uri error.
3 ya you can pass ids internally by writing any custom parser.or divide
data into different shard.



On Thu, 27 Jun, 2019, 12:01 AM Lucky Sharma,  wrote:

> Hi all,
>
> What we are doing is, we will be having a set of unique Ids of solr
> document at max 1500, we need to run faceting and sorting among them.
> there is no direct search involved.
> It's a head-on search since we already know the document unique keys
> beforehand.
>
> 1. Is Solr a better use case for such kind of problem?
> 2. Since we will be passing 1500 unique document ids, As per my
> understanding it will impact query tree as it will grow bigger. Will
> there be any other impacts?
> 3. Is it wise to use or solve the situation in this way?
>
>
> --
> Warm Regards,
>
> Lucky Sharma
>


Re: Large Filter Query

2019-06-26 Thread Lucky Sharma
@Shawn: Sorry I forgot to mention the corpus size: the corpus size is
around 3 million docs, where we need to query for 1500 docs and run
aggregations, sorting, search on them.

@David: But will that not be a performance hit (resource incentive)?
since it will have that many terms to search upon, the query parse
tree will be big, isn't it?


Re: Large Filter Query

2019-06-26 Thread David Hastings
you can use the !terms operator and send them separated by a comma:

{!terms f=id}id1,id2,..id1499,id1500

and run facets normally


On Wed, Jun 26, 2019 at 2:31 PM Lucky Sharma  wrote:

> Hi all,
>
> What we are doing is, we will be having a set of unique Ids of solr
> document at max 1500, we need to run faceting and sorting among them.
> there is no direct search involved.
> It's a head-on search since we already know the document unique keys
> beforehand.
>
> 1. Is Solr a better use case for such kind of problem?
> 2. Since we will be passing 1500 unique document ids, As per my
> understanding it will impact query tree as it will grow bigger. Will
> there be any other impacts?
> 3. Is it wise to use or solve the situation in this way?
>
>
> --
> Warm Regards,
>
> Lucky Sharma
>


Re: Large Filter Query

2019-06-26 Thread Shawn Heisey

On 6/26/2019 12:31 PM, Lucky Sharma wrote:

What we are doing is, we will be having a set of unique Ids of solr
document at max 1500, we need to run faceting and sorting among them.
there is no direct search involved.
It's a head-on search since we already know the document unique keys beforehand.

1. Is Solr a better use case for such kind of problem?
2. Since we will be passing 1500 unique document ids, As per my
understanding it will impact query tree as it will grow bigger. Will
there be any other impacts?
3. Is it wise to use or solve the situation in this way?


Where exactly does the number "1500" fit in?  It's not clear from what 
you've said.  If there will be 1500 documents total, that's an extremely 
small index.  If that's the number of values in a single query, there 
are solutions for any of the problems that might arise as a result of that.


When you ask whether Solr is better, better than what?

More detail will be needed in order to provide any useful information.

Thanks,
Shawn


Large Filter Query

2019-06-26 Thread Lucky Sharma
Hi all,

What we are doing is, we will be having a set of unique Ids of solr
document at max 1500, we need to run faceting and sorting among them.
there is no direct search involved.
It's a head-on search since we already know the document unique keys beforehand.

1. Is Solr a better use case for such kind of problem?
2. Since we will be passing 1500 unique document ids, As per my
understanding it will impact query tree as it will grow bigger. Will
there be any other impacts?
3. Is it wise to use or solve the situation in this way?


-- 
Warm Regards,

Lucky Sharma


Re: Invoice 6873 from Sobek Digital Hosting and Consulting, LLC 26.06.19

2019-06-26 Thread Mark Sullivan
All,


THIS EMAIL IS PHISHING AND IMPERSONATED MY EMAIL ADDRESS.


PLEASE IGNORE!


Mark


From: Mark Sullivan
Sent: Wednesday, June 26, 2019 1:29:09 PM
Subject: Invoice 6873 from Sobek Digital Hosting and Consulting, LLC 26.06.19


Hi,



Mark used box to share INV-6873



Kindly press REVIEW DOCUMENT 
 to access the secure 
document



Please let us know if there is any skipped invoices.



Thank you

Mark V. Sullivan

CIO & Application Architect

Sobek Digital Hosting and Consulting, LLC

mark.v.sulli...@sobekdigital.com

866-981-5016 (office)

352-682-9692 (mobile)




Invoice 6873 from Sobek Digital Hosting and Consulting, LLC 26.06.19

2019-06-26 Thread Mark Sullivan
Hi,



Mark used box to share INV-6873



Kindly press REVIEW DOCUMENT 
 to access the secure 
document



Please let us know if there is any skipped invoices.



Thank you
Mark V. Sullivan
CIO & Application Architect
Sobek Digital Hosting and Consulting, LLC
mark.v.sulli...@sobekdigital.com
866-981-5016 (office)
352-682-9692 (mobile)



Solr 6.6.0 - Multiple DataSources - Performance / Delta Issues - MSSQL(Azure)

2019-06-26 Thread Joseph_Tucker
I've currently got a data configuration that uses multiple dataSources.
I have a main dataSource that contains shared inventory data, and individual
dataSources that contain price data that differs from database to database.
(I have little to no say in how the Databases can be structured)

The scenario is: I have multiple shops (x amount, but for the sake of this
example, say 10 shops)
Each shop will contain the same inventory data about products. However, each
shop will contain different price data per product. 
Example: Shop1 has Chocolate for $1 and Shop2 has Chocolate for $0.95

My configuration looks something like this:



 ...
  
   
  
  
   
  
  
   
  
  ...


A few issues I've noticed when testing this on a local machine.
1) Performance on full-indexes degrades with each price entity that I add.
With only three prices, I'm seeing indexing take as slow as 25 records per
second.
Is there a better way to go about gathering the price data?

2)When performing Deltas, I cannot use dataimporter.last_index_time as I do
not have anything to compare to (at least not that I'm immediately aware
of). 
I have a table that I've been able to use that contains a column called
"LastTime" of the type BigInt. 
I use this column to update with the global variable @@DBTS after each
Full-Index and after each deltaQuery
i.e.
query="
DECLARE @LatestUpdate AS bigint;
SET @LatestUpdate = (SELECT @@DBTS);

Select ... <- main select to get all the data

UPDATE [SolrQueue]
SET [LastTime] = @LatestUpdate
FROM [SolrQueue] "

^ similar in deltaQuery


I have a parentDeltaQuery in each price entity that is sending the product
IDs back to the root entity 
( select id from Products where id = '${price2.id}' )

The issue that comes up here, is when the delta is running, I get a table
lock . Is there a better method to retrieve what prices have changed?


Any assistance would be greatly appreciated.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Solr 6.6.0 - Indexing Multiple DataSources multiple child entities

2019-06-26 Thread Joseph_Tucker
[Using Solr 6.6.0]

I've currently got a few databases that I'm indexing.
To give the scenario: I have 10 different shops
Each shop will have the same inventory, but different price tags on each
item.
(i.e. Shop 1 sells Chocolate for $1, and Shop 2 sells Chocolate for $0.95...
etc)
I'm connecting to an SQL Database for the Inventory information and a
separate Database for each individual Shop price information (I don't have
much control over how the database is structured)

The way my db-config.xml file is structured is something like this:


 
   ...
 
   
  
 
   
  
 
   
  



a few problems I'm running into...

Firstly: I'm seeing a really slow index the more shops I add. Is there a
better way to go about this?

Secondly: how can I ensure I get prices updated if the only DB that changes
when I run a delta is Shop3 ? ... etc.

Thirdly: I can't seem to use dataimporter.last_index_time as there is no
"last updated" on the database. 
I have a separate table that stores the @@DBTS (from mssql) after each
full-import *or* delta-import. 

The problem is, I need to run this on each DB and as such, each entity under
the root entity, which can cause lock issues for each sql update that's run.
This is far from efficient, far from best practices...  however I'm not sure
of a better way to go about this.

Any help will be more than appreciated

Thanks

Joe






--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Postgres Large Object Facility

2019-06-26 Thread Christopher Spooner
I am using SOLR 7.7.2  and trying to index binary data that is stored in
postgresql's large object feature (OID type / lo module) and not directly
in the database.  Is this possible?  If so are there any examples of others
configuring SOLR in this way?

Attached are my db-data-config and managed-schema files for reference.
This same file works against an oracle database with the same data.



  




  



  file_name 
  


		

	
	
	  







Re: Replication issue with version 0 index in SOLR 7.5

2019-06-26 Thread Patrick Bordelon
One other question related to this.

I know the change was made for a specific problem that was occurring but has
this caused a similar problem as mine with anyone else?

We're looking to try changing the second 'if' statement to add an extra
conditional to prevent it from performing the "deleteAll" operation unless
absolutely specified.

The idea is to use the skipCommitOnMasterVersionZero and set it so that the
if statement will never be true on a new generation index on the primary.

We're going to try some modifications on our polling strategy as a temporary
solution while we test out changing that section of the index fetcher.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: SolrInputDocument setField method

2019-06-26 Thread Shawn Heisey

On 6/26/2019 9:52 AM, Vincenzo D'Amore wrote:

I have a very basic question related to the SolrInputDocument behaviour.

Looking at SolrInputDocument source code I found how the method setField
works:

   public void setField(String name, Object value )
   {
 SolrInputField field = new SolrInputField( name );
 _fields.put( name, field );
 field.setValue( value );
   }

The field name is "duplicated" into the SolrInputField.


What this does is creates an entirely new SolrInputField object -- one 
that does not have a value.  Then it puts that object into a map of all 
fields for this document.  Then it assigns the value directly to the 
Field object, which is already inside the map.


Side note:  The "put" method used there will replace any existing field 
with the same name, turning that field object into garbage that Java 
will eventually collect.


If there is already an existing Field object in the document's map 
object with the same name, it will likely have no references, so the 
garbage collector will eventually collect that object and its component 
objects.


The only duplication I can see here is that both the inner field object 
and the outer map contain the name of the field.  Unless you have a 
really huge number of fields, this would not have a significant impact 
on the amount of memory required.


The map object (_fields) that basically represents the whole document 
needs *something* to map each entry.  The field name is convenient and 
relevant.  It is also usually a fairly short string.


It is likely that other code that uses a SolrInputField object will only 
have that object, not the map, so the name of the field must be in the 
field object.


It is probably possible to achieve slightly better memory efficiency by 
switching the internal implementation from Map to List or Set ... but it 
would make SolrInputDocument MUCH less efficient in other ways, 
including the setField method you have quoted above.  I do not think it 
would be a worthwhile trade.


Thanks,
Shawn


Re: SolrInputDocument setField method

2019-06-26 Thread Samuel Kasimalla
Hi Vicenzo,

May be looking at the overridden toString() would give you a clue.

The second part, I don't think SolrJ holds it it twice(if you are worried
about redundant usage of memory), BUT if you haven't used SolrJ so far and
wanted to know if this is the format in which it pushes to Solr, I'm pretty
sure it doesn't push this format into Solr.

Thanks,
Sam
https://www.linkedin.com/in/skasimalla

On Wed, Jun 26, 2019 at 11:52 AM Vincenzo D'Amore 
wrote:

> Hi all,
>
> I have a very basic question related to the SolrInputDocument behaviour.
>
> Looking at SolrInputDocument source code I found how the method setField
> works:
>
>   public void setField(String name, Object value )
>   {
> SolrInputField field = new SolrInputField( name );
> _fields.put( name, field );
> field.setValue( value );
>   }
>
> The field name is "duplicated" into the SolrInputField.
>
> For example, if I'm storing a field "color" with value "red"  what we have
> is a Map like this:
>
> { "key" : "color", "value" : { "name" : "color", "value" : "red" } }
>
> the name field "color" appears twice. Very likely there is a reason for
> this, could you please point me in the right direction?
>
> For example, I'm worried about at what happens with SolrJ when I'm sending
> a lot of documents, where for each field the fieldName is sent twice.
>
> Thanks,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
>


Re: giving weight for SynonymFilterFactory terms

2019-06-26 Thread Ruslan Dautkhanov
Any way in Solr to give weights to synonyms?

Thanks.


On Sun, Jun 23, 2019 at 4:39 PM Ruslan Dautkhanov 
wrote:

> Hello!
>
> Is there is a way for Solr to assign weights for terms produced
> by SynonymFilterFactory?
>
> We'd like to give smaller weight for synonyms words/terms injected by
> SynonymFilterFactory.
> So exact matches weigh higher get higher score.
>
> First use case if to just give one static weight for all synonyms
> and if search-time matches through synonyms it'll have a certain (lower)
> weight than exact match.
>
> Can't find this in documentation.
>
> Another use case is to fine-tune each synonyms with a particular weight
> for each particular synonym (i.e. synonyms="synonyms.txt" would have 3
> columns and not 2). It seems not currently possible, so perhaps just
> static
> weight for all synonyms described above would be possible.
>
> Any pointers highly appreciated.
>
> Thank you,
> Ruslan
>
>


SolrInputDocument setField method

2019-06-26 Thread Vincenzo D'Amore
Hi all,

I have a very basic question related to the SolrInputDocument behaviour.

Looking at SolrInputDocument source code I found how the method setField
works:

  public void setField(String name, Object value )
  {
SolrInputField field = new SolrInputField( name );
_fields.put( name, field );
field.setValue( value );
  }

The field name is "duplicated" into the SolrInputField.

For example, if I'm storing a field "color" with value "red"  what we have
is a Map like this:

{ "key" : "color", "value" : { "name" : "color", "value" : "red" } }

the name field "color" appears twice. Very likely there is a reason for
this, could you please point me in the right direction?

For example, I'm worried about at what happens with SolrJ when I'm sending
a lot of documents, where for each field the fieldName is sent twice.

Thanks,
Vincenzo


-- 
Vincenzo D'Amore


Migrating from JdbcDataSource to ContentStreamDataSource

2019-06-26 Thread Reinharn
Im trying to get off my jdbc data source and move to a Streaming data source.
I have successfully implemented a node.js api that will push items to my
solr index using the /update/json which is defined out of the box as:
 


This process replaces the 'delta' 

We still have our /dataimport DataImportHandler that handles out 'full
import' which uses a jdbc connection looks like the following

solrconfig.xml




data-config.xml
false



data-config.xml (partial)

 



 
 
 
 
 
 
 



I would really like to be able to just stream my indexing and ditch the jdbc
one. I have a couple questions.

1. Does the ContentStreamDataSource post out to an api or does it wait for
something to post to it?
2. Does ContentStreamDataSource has a JSON processor? I only see
XPathEntityProcessor for xml
3. Is there a way to get status of this stream?
  - Right now I can hit
/COLLECTION2/dataimport?_==status=on=json
  - It responds with:
{
  "responseHeader":{
"status":0,
"QTime":0},
  "initArgs":[
"defaults",[
  "config","data-config.xml",
  "clean","false"]],
  "command":"status",
  "status":"idle",
  "importResponse":"",
  "statusMessages":{
"Total Requests made to DataSource":"0",
"Total Rows Fetched":"0",
"Total Documents Processed":"0",
"Total Documents Skipped":"0",
"Time taken":"0:0:0.0"}
}


My gut was to implement it like this:

solrconfig.xml




stream-data-config.xml
false



stream-data-config.xml


  
 


 
 
 
 
 
 
 



I think i might be crossing some streams here on how this all works. Any
advice is appriciated.

Thanks,

Nate







--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Suggester not returning all possible completions for a query in a large index

2019-06-26 Thread Christian Ortner
Hello Everyone,

I'm using suggesters with Solr 6.4 to get suggestions for a field with a
decent number of different values across a large number of documents that
is configured like this:


vendorSuggester
BlendedInfixLookupFactory
600
false
DocumentDictionaryFactory
attrib_vendor
group
keyword
false
text_result_autocomplete
vendorSuggester


The field type is configured like this, although I'm pretty sure that it's
not the culprit because I tried multiple field types with no improvement:



















It works very well for small data sets, but for larger ones, less
suggestions than requested with count are returned, although I know that
the data set would contain more values suitable for completion. Even with a
query that matches exactly the expected suggestion, I don't get that
suggestion. In particular, it doesn't seem to suggest shorter values with
the same prefix, only the longest, but that just might be the cases I
tested.

I already remedied the situation by cranking up numFactor, but that only
makes users less likely to experience this problem, and increasing
numFactor further would make performance unacceptable.

Unfortunately, I can't use other lookup factory implementations, because
context filtering is necessary to limit suggestions to certain groups of
users. AnalyzingInfixLookupFactory, which also supports context fields,
doesn't help with the situation.

Are there any ideas on how to solve getting exhaustive suggestions with
decent performance for large data sets? I'd appreciate any hints.

Cheers,
Chris


Problems using a suggester component in the /select handler in cloud mode

2019-06-26 Thread Alexandros Paramythis

Hi everyone,

Environment:

Solr 7.5.0, cloud mode (but problem should be identical in multiple 
versions, at least in 7.x)


Summary:

We have a Solr configuration that returns suggestions in the course of a 
normal search call (i.e., we have a 'suggest' component added to the 
'last-components' for '/select' request handler). This does not work in 
cloud mode, where we get an NPE in QueryComponent. This problem seems to 
have been reported in various forms in the past -- see for example [1] 
and [2] (links at the end of this email) -- but we couldn't find any 
resolution (or in-depth discussion for that matter).


In more detail:

We have a suggest component configured as follows:

  

    
    default
    name="classname">org.apache.solr.spelling.suggest.Suggester
    name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory

    dict_default
    text_suggest
    text_suggest
    true
    true
    true
    

    
    suggest_phrase
    name="lookupImpl">org.apache.solr.spelling.suggest.fst.AnalyzingLookupFactory

    dict_suggest_phrase
    name="suggestAnalyzerFieldType">text_suggest_phrase

    suggest_phrase
    true
    true
    true
    

    
    suggest_infix_shingle
    AnalyzingInfixLookupFactory
    suggestInfixShingleDir
    name="suggestAnalyzerFieldType">text_suggest_phrase

    suggest_phrase
    true
    true
    true
    

    
    suggest_prefix
    Suggester
    AnalyzingLookupFactory
    name="suggestAnalyzerFieldType">text_suggest_prefix

    suggest_prefix
    true
    true
    true
    

  


This component works without issued both in standalone and cloud mode, 
when used as the sole component in a handler, such as in the following 
excerpt:


    startup="lazy">

    
    default
        suggest_phrase
    name="suggest.dictionary">suggest_infix_shingle

    suggest_prefix
    true
    10
    false
    
    
    suggest
    
    


It also works when used along with other component in standalone mode, 
such as in the following excerpt, where we use the suggest component to 
get suggestions during a "normal" search call:


    
    
    explicit
    10
    text_search

    edismax

    title^5.0 subtitle^3.0 
abstract^2.0 text_search
    title^5.0 subtitle^3.0 
abstract^2.0 text_search

    4
    on
    default
    true
    10
    5
    5
    true
    name="spellcheck.collateExtendedResults">true

    10
    5

    default
    suggest_phrase
    name="suggest.dictionary">suggest_infix_shingle

    suggest_prefix
    true
    10
    false
    

    
    suggest
    spellcheck
    
    

However, the above configuration does not work in cloud mode, where we 
get an NPE if a search call is made:


 o.a.s.s.HttpSolrCall null:java.lang.NullPointerException
at 
org.apache.solr.handler.component.QueryComponent.unmarshalSortValues(QueryComponent.java:1034)
at 
org.apache.solr.handler.component.QueryComponent.mergeIds(QueryComponent.java:885)
at 
org.apache.solr.handler.component.QueryComponent.handleRegularResponses(QueryComponent.java:585)
at 
org.apache.solr.handler.component.QueryComponent.handleResponses(QueryComponent.java:564)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:426)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2541)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
at