Solr for Content Management

2018-06-07 Thread Moenieb Davids
Hi All,

Background:
I am currently testing a deployment of a content management framework where
I am trying to punt Solr as the tool of choice for ingestion and searching.

Current status:
I have deployed SolrCloud across multiple servers with multiple shards and
a replication factor of 2.
In terms of collections, I have a person collection that contains details
individuals including address and high level portfolio info. Structurally,
this collection contains great grandchildren.
Then I have a few collections that deals with content. For now, content is
just emails and document with a max size of 2MB, with certain user
exceptions that can go higher than 2MB.
Content is indexed twice in terms of the actual content, firstly as
binary/stream and then as readable text. Metadata is negligible


Challenges:
When performing full text searches without concurrently executing updates,
solr seems to be doing well. Running updates also does okish given the
nature of the transaction. However, when I run search and updates
simultaneously, performance drops quite significantly. I have played with
field properties, analyzers, tokenizers, shafting sizes etc.
Any advice?
Would like to know if anyone has done something similar. Please excuse the
long winded message


-- 
Sent from Gmail Mobile



-- 
Sent from Gmail Mobile


Re: [ANNOUNCE] Apache Solr 5.5.5 released

2017-10-24 Thread Moenieb Davids
Hi Steve,

I have just started with Solr 7.*, so I am a bit confused with 5.5.5, same
with lucene.
Also, the sites register versions 7.*,
Apologies for my ignorance if I had missed anything or do not have a proper
understanding of the version management

Regards
Moenieb

On Tue, Oct 24, 2017 at 6:27 PM, Steve Rowe <sar...@gmail.com> wrote:

> Yes.
>
> --
> Steve
> www.lucidworks.com
>
> > On Oct 24, 2017, at 12:25 PM, Moenieb Davids <moenieb.dav...@gmail.com>
> wrote:
> >
> > Solr 5.5.5?
> >
> > On 24 Oct 2017 17:34, "Steve Rowe" <sar...@gmail.com> wrote:
> >
> >> 24 October 2017, Apache Solr™ 5.5.5 available
> >>
> >> The Lucene PMC is pleased to announce the release of Apache Solr 5.5.5.
> >>
> >> Solr is the popular, blazing fast, open source NoSQL search platform
> from
> >> the
> >> Apache Lucene project. Its major features include powerful full-text
> >> search,
> >> hit highlighting, faceted search and analytics, rich document parsing,
> >> geospatial search, extensive REST APIs as well as parallel SQL. Solr is
> >> enterprise grade, secure and highly scalable, providing fault tolerant
> >> distributed search and indexing, and powers the search and navigation
> >> features
> >> of many of the world's largest internet sites.
> >>
> >> This release contains one bugfix.
> >>
> >> This release includes one critical and one important security fix.
> Details:
> >>
> >> * Fix for a 0-day exploit (CVE-2017-12629), details:
> >> https://s.apache.org/FJDl.
> >> RunExecutableListener has been disabled by default (can be enabled by
> >> -Dsolr.enableRunExecutableListener=true) and resolving external
> entities
> >> in the
> >> XML query parser (defType=xmlparser or {!xmlparser ... }) is disabled by
> >> default.
> >>
> >> * Fix for CVE-2017-7660: Security Vulnerability in secure inter-node
> >> communication
> >> in Apache Solr, details: https://s.apache.org/APTY
> >>
> >> Furthermore, this release includes Apache Lucene 5.5.5 which includes
> one
> >> security
> >> fix since the 5.5.4 release.
> >>
> >> The release is available for immediate download at:
> >>
> >> http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.5
> >>
> >> Please read CHANGES.txt for a detailed list of changes:
> >>
> >> https://lucene.apache.org/solr/5_5_5/changes/Changes.html
> >>
> >> Please report any feedback to the mailing lists
> >> (http://lucene.apache.org/solr/discussion.html)
> >>
> >> Note: The Apache Software Foundation uses an extensive mirroring
> >> network for distributing releases. It is possible that the mirror you
> >> are using may not have replicated the release yet. If that is the
> >> case, please try another mirror. This also goes for Maven access.
>
>


Re: [ANNOUNCE] Apache Solr 5.5.5 released

2017-10-24 Thread Moenieb Davids
Solr 5.5.5?

On 24 Oct 2017 17:34, "Steve Rowe"  wrote:

> 24 October 2017, Apache Solr™ 5.5.5 available
>
> The Lucene PMC is pleased to announce the release of Apache Solr 5.5.5.
>
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the
> Apache Lucene project. Its major features include powerful full-text
> search,
> hit highlighting, faceted search and analytics, rich document parsing,
> geospatial search, extensive REST APIs as well as parallel SQL. Solr is
> enterprise grade, secure and highly scalable, providing fault tolerant
> distributed search and indexing, and powers the search and navigation
> features
> of many of the world's largest internet sites.
>
> This release contains one bugfix.
>
> This release includes one critical and one important security fix. Details:
>
> * Fix for a 0-day exploit (CVE-2017-12629), details:
> https://s.apache.org/FJDl.
> RunExecutableListener has been disabled by default (can be enabled by
> -Dsolr.enableRunExecutableListener=true) and resolving external entities
> in the
> XML query parser (defType=xmlparser or {!xmlparser ... }) is disabled by
> default.
>
> * Fix for CVE-2017-7660: Security Vulnerability in secure inter-node
> communication
> in Apache Solr, details: https://s.apache.org/APTY
>
> Furthermore, this release includes Apache Lucene 5.5.5 which includes one
> security
> fix since the 5.5.4 release.
>
> The release is available for immediate download at:
>
> http://www.apache.org/dyn/closer.lua/lucene/solr/5.5.5
>
> Please read CHANGES.txt for a detailed list of changes:
>
> https://lucene.apache.org/solr/5_5_5/changes/Changes.html
>
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
>
> Note: The Apache Software Foundation uses an extensive mirroring
> network for distributing releases. It is possible that the mirror you
> are using may not have replicated the release yet. If that is the
> case, please try another mirror. This also goes for Maven access.


Deeply nested search return

2017-09-02 Thread Moenieb Davids
Hi All,

I would like to know if anybody has done deeply nested searches.
I am currently sitting with the use case below:
Successfully Indexed Document:

Level1_Doc

Ø  ID

Ø  DocType

Ø  Level2_Doc

Ø  ID

Ø  DocType

Ø  Level3_Doc

Ø  ID

Ø  DocType

Ø  Level4_Doc

Ø  ID

Ø  DocType


What is the best approach to get the search result with the same structure?
Having a child structure seems quite easy for searching and retrieving a
nested structure using BJQ and ChildEntityProcessor, however, things seem
to get trickier once you go grandchild and beyond


RE: Solr Search Handler Suggestion

2017-01-26 Thread Moenieb Davids
Hi Mikhail,

The per row scenario would cater for queries that is looking at specific rows 
with.
For example, I need address and bank details of a member that is stored on a 
different core

I guess what I am trying to do is get Solr search functionality that is similar 
to DB, something which I can easily plug my marious corporate solutions into o 
that they can retrieve info

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: 26 January 2017 09:23 AM
To: solr-user
Subject: Re: Solr Search Handler Suggestion

Hello,  Moenieb.

It'd worth to mention that it's not effective to include, java-user@ in this 
thread.
Also, this proposal is purposed DIH, that's worth to be mentioned in subj.
Then, this config looks like it will issue solr request per every parent row 
that's deadly inefficient.

On Wed, Jan 25, 2017 at 10:53 AM, Moenieb Davids <moenieb.dav...@gpaa.gov.za
> wrote:

> Hi Guys,
>
> Just an Idea for easier config of search handlers:
>
> Will it be feasible to configure a search handler that has its own 
> schema based on the current core as well as inserting nested objects 
> from cross core queries.
>
> Example (for illustration purpose, ignore syntax :) ) 
>
>   
>
>   
>
>   
> http://localhost:8983/solr/items; query="user_liking_this:${thiscore.id}"
> >
>   
>   
> 
>
>   
> http://localhost:8983/solr/items; query="user_liking_this:${thiscore.id}"
> >
>   
>   
> 
>
> 
>
> This will allow you to create endpoints to interact and return fields 
> and their values from others cores and seems to be possibly easier to manage?
>
>
>
>
>
>
>
>
>
>
> 
> ===
> GPAA e-mail Disclaimers and confidential note
>
> This e-mail is intended for the exclusive use of the addressee only.
> If you are not the intended recipient, you should not use the contents 
> or disclose them to any other person. Please notify the sender 
> immediately and delete the e-mail. This e-mail is not intended nor 
> shall it be taken to create any legal relations, contractual or otherwise.
> Legally binding obligations can only arise for the GPAA by means of a 
> written instrument signed by an authorised signatory.
> 
> ===
>



--
Sincerely yours
Mikhail Khludnev










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


Solr Search Handler Suggestion

2017-01-24 Thread Moenieb Davids
Hi Guys,

Just an Idea for easier config of search handlers:

Will it be feasible to configure a search handler that has its own schema based 
on the current core as well as inserting nested objects from cross core queries.

Example (for illustration purpose, ignore syntax :) )


  

  

  
http://localhost:8983/solr/items; query="user_liking_this:${thiscore.id}" >
  
  


  
http://localhost:8983/solr/items; query="user_liking_this:${thiscore.id}" >
  
  




This will allow you to create endpoints to interact and return fields and their 
values from others cores and seems to be possibly easier to manage?










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


RE: Joining Across Collections

2017-01-20 Thread Moenieb Davids
Hi Guys

Just a quick question on search and join:

I have a few cores which is based on a mainframe extract, 1 core per extracted 
file which resembles a "DB Table"
The cores are all somehow linked via 1 to many fields, with a structure similar 
to a normal ERD

Is it possible to return the result from a query that joins lets say 3 cores in 
the following format:

"core1_id":"XXX",
"_childDocuments_":[
{
  "core2_id":"yyy",
  "core_2_fieldx":"ABC",
  "_childDocuments_":[
  {
"core3_id":"zzz",
"core_3_fieldx":"ABC",
"core3_fieldy":"123",
  {
  "core2_fieldy":"123",
{

Regards
Moenieb Davids

-Original Message-
From: nabil Kouici [mailto:koui...@yahoo.fr.INVALID] 
Sent: 20 January 2017 03:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Joining Across Collections

Hi Mikhail,
I'm not we can do a negation because Field1 and Field2 are not in the join 
condition.
Regards,Nabil.

  De : Mikhail Khludnev <m...@apache.org>
 À : solr-user <solr-user@lucene.apache.org>; nabil Kouici <koui...@yahoo.fr> 
 Envoyé le : Jeudi 19 janvier 2017 9h00
 Objet : Re: Joining Across Collections
   
It seems like it can be done by just negating join query or I'm missing
something.

On Wed, Jan 18, 2017 at 11:32 AM, nabil Kouici <koui...@yahoo.fr.invalid>
wrote:

> Hi All,
> I'm using  join across collection feature to do an inner join between 2
> collections. It works fine.
> Is it possible to use this feature to compare between fields from
> different collections. For exemple:
> Collection1 Field1Collection2 Field2
> search document from Collection1 where Field1 != Field2
> In sql, this will translated to:
> Select A.* From Collection1 A inner join Collection2 B on  A.id=B.idWhere
> A.Field1<>B.Field2
>
> Thank you.
> Regards,NKI.
>



-- 
Sincerely yours
Mikhail Khludnev


   










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


RE: Joining Across Collections

2017-01-19 Thread Moenieb Davids
Hi Guys

Just a quick question on search and join:

I have a few cores which is based on a mainframe extract, 1 core per extracted 
file which resembles a "DB Table"
The cores are all somehow linked via 1 to many fields, with a structure similar 
to a normal ERD

Is it possible to return the result from a query that joins lets say 3 cores in 
the following format:

"core1_id":"XXX",
"_childDocuments_":[
{
  "core2_id":"yyy",
  "core_2_fieldx":"ABC",
  "_childDocuments_":[
  {
"core3_id":"zzz",
"core_3_fieldx":"ABC",
"core3_fieldy":"123",
  {
  "core2_fieldy":"123",
{

Regards
Moenieb Davids

-Original Message-
From: Mikhail Khludnev [mailto:m...@apache.org] 
Sent: 19 January 2017 10:00 AM
To: solr-user; nabil Kouici
Subject: Re: Joining Across Collections

It seems like it can be done by just negating join query or I'm missing 
something.

On Wed, Jan 18, 2017 at 11:32 AM, nabil Kouici <koui...@yahoo.fr.invalid>
wrote:

> Hi All,
> I'm using  join across collection feature to do an inner join between 
> 2 collections. It works fine.
> Is it possible to use this feature to compare between fields from 
> different collections. For exemple:
> Collection1 Field1Collection2 Field2
> search document from Collection1 where Field1 != Field2 In sql, this 
> will translated to:
> Select A.* From Collection1 A inner join Collection2 B on  
> A.id=B.idWhere
> A.Field1<>B.Field2
>
> Thank you.
> Regards,NKI.
>



--
Sincerely yours
Mikhail Khludnev










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


RE: Search for ISBN-like identifiers

2017-01-17 Thread Moenieb Davids
Hi Guys

Just a quick question on search, which in not related to this post:

I have a few cores which is based on a mainframe extract, 1 core per extracted 
file which resembles a "DB Table"
The cores are all somehow linked via 1 to many fields, with a structure similar 
to a normal ERD

Is it possible to return the result from a query that joins lets say 3 cores in 
the following format:

"core1_id":"XXX",
"_childDocuments_":[
{
  "core2_id":"yyy",
  "core_2_fieldx":"ABC",
  "_childDocuments_":[
  {
"core3_id":"zzz",
"core_3_fieldx":"ABC",
"core3_fieldy":"123",
  {
  "core2_fieldy":"123",
{

Regards
Moenieb Davids

-Original Message-
From: Josh Lincoln [mailto:josh.linc...@gmail.com] 
Sent: 05 January 2017 08:57 PM
To: solr-user@lucene.apache.org
Subject: Re: Search for ISBN-like identifiers

Sebastian,
You may want to try adding autoGeneratePhraseQueries="true" to the fieldtype.
With that setting, a query for 978-3-8052-5094-8 will behave just like "978
3 8052 5094 8" (with the quotes)

A few notes about autoGeneratePhraseQueries
a) it used to be set to true by default, but that was changed several years ago
b) does NOT require a reindex, so very easy to test
c) apparently not recommended for non-whitespace delimited languages (CJK, 
etc), but maybe that's not an issue in your use case.
d) i'm unsure how it'll impact wildcard queries on that field. E.g. will
978-3-8052* match 978-3-8052-5094-8? At the very least, partial ISBNs (e.g.
978-3-8052) would match full ISBN without needing to use the wildcard. I'm just 
not sure what happens if the user includes the wildcard.

Josh

On Thu, Jan 5, 2017 at 1:41 PM Sebastian Riemer <s.rie...@littera.eu> wrote:

> Thank you very much for taking the time to help me!
>
> I'll definitely have a look at the link you've posted.
>
> @ShawnHeisey Thanks too for shedding light on the wildcard behaviour!
>
> Allow me one further question:
> - Assuming that I define a separate field for storing the ISBNs, using 
> the awesome analyzer provider by Mr. Bill Dueber. How do I get that 
> field copied into my general text field, which is used by my 
> QuickSearch-Input?
> Won't that field be processed again by the analyser defined on the 
> text field?
> - Should I alternatively add more fields to the q-Parameter? As for 
> now, I always have set q=text: but I 
> guess one could try something like 
> q=text:+isbnspeciallookupfield: want_to_search>
>
> I don't really know about that last idea though, since the searches 
> are propably OR-combined which is not what I like to have.
>
> Third option would be, to pre-process the distinction to where to look 
> at in the solr in my application of course. I.e. everything being a 
> regex containing only numbers and hyphens with length 13 -> don't 
> query on field text, instead use field isbnspeciallookupfield
>
>
> Many thanks again, and have a nice day!
> Sebastian
>
>
> -Ursprüngliche Nachricht-
> Von: Erik Hatcher [mailto:erik.hatc...@gmail.com]
> Gesendet: Donnerstag, 5. Januar 2017 19:10
> An: solr-user@lucene.apache.org
> Betreff: Re: Search for ISBN-like identifiers
>
> Sebastian -
>
> There’s some precedent out there for ISBN’s.  Bill Dueber and the 
> UMICH/code4lib folks have done amazing work, check it out here -
>
> https://github.com/mlibrary/umich_solr_library_filters < 
> https://github.com/mlibrary/umich_solr_library_filters>
>
>   - Erik
>
>
> > On Jan 5, 2017, at 5:08 AM, Sebastian Riemer <s.rie...@littera.eu>
> wrote:
> >
> > Hi folks,
> >
> >
> > TL;DR: Is there an easy way, to copy ISBNs with hyphens to the 
> > general
> text field, respectively configure the analyser on that field, so that 
> a search for the hyphenated ISBN returns exactly the matching document?
> >
> > Long version:
> > I've defined a field "text" of type "text_general", where I copy all 
> > my other fields to, to be able to do a "quick search" where I set 
> > q=text
> >
> > The definition of the type text_general is like this:
> >
> >
> >
> >  > positionIncrementGap="100">
> >
> >  
> >
> >
> >
> > > words="stopwords.txt" />
> >
> >
> >
> >  
> >
> >  
> >
> >
> >
> >   

Missing Segment File

2017-01-15 Thread Moenieb Davids
Hi All,

How does one resolve the missing segments issue:
 java.nio.file.NoSuchFileException: /pathxxx/data/index/segments_1bj

Seems like it only occurs on large csv imports via DIH











===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===



RE: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Moenieb Davids
Hi,

Aplogies for my response, did not read the question properly. 
I was speaking about splitting files for import

-Original Message-
From: billnb...@gmail.com [mailto:billnb...@gmail.com] 
Sent: 09 January 2017 05:45 PM
To: solr-user@lucene.apache.org
Subject: Re: Help needed in breaking large index file into smaller ones

Can you set Solr config segments to a higher number, don't optimize and you 
will get smaller files after a new index is created.

Can you reindex ?

Bill Bell
Sent from mobile


> On Jan 9, 2017, at 7:15 AM, Narsimha Reddy CHALLA <chnredd...@gmail.com> 
> wrote:
> 
> No, it does not work by splitting. First of all lucene index files are 
> not text files. There is a segment_NN file which will refer index 
> files in a commit. So, when we split a large index file into smaller 
> ones, the corresponding segment_NN file also needs to be updated with 
> new index files OR a new segment_NN file should be created, probably.
> 
> Can someone who is familiar with lucene index files please help us in 
> this regard?
> 
> Thanks
> NRC
> 
> On Mon, Jan 9, 2017 at 7:38 PM, Manan Sheth 
> <manan.sh...@impetus.co.in>
> wrote:
> 
>> Is this really works for lucene index files?
>> 
>> Thanks,
>> Manan Sheth
>> 
>> From: Moenieb Davids <moenieb.dav...@gpaa.gov.za>
>> Sent: Monday, January 9, 2017 7:36 PM
>> To: solr-user@lucene.apache.org
>> Subject: RE: Help needed in breaking large index file into smaller 
>> ones
>> 
>> Hi,
>> 
>> Try split on linux or unix
>> 
>> split -l 100 originalfile.csv
>> this will split a file into 100 lines each
>> 
>> see other options for how to split like size
>> 
>> 
>> -Original Message-
>> From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com]
>> Sent: 09 January 2017 12:12 PM
>> To: solr-user@lucene.apache.org
>> Subject: Help needed in breaking large index file into smaller ones
>> 
>> Hi All,
>> 
>>  My solr server has a few large index files (say ~10G). I am 
>> looking for some help on breaking them it into smaller ones (each < 
>> 4G) to satisfy my application requirements. Are there any such tools 
>> available?
>> 
>> Appreciate your help.
>> 
>> Thanks
>> NRC
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ===
>> GPAA e-mail Disclaimers and confidential note
>> 
>> This e-mail is intended for the exclusive use of the addressee only.
>> If you are not the intended recipient, you should not use the 
>> contents or disclose them to any other person. Please notify the 
>> sender immediately and delete the e-mail. This e-mail is not intended 
>> nor shall it be taken to create any legal relations, contractual or 
>> otherwise.
>> Legally binding obligations can only arise for the GPAA by means of a 
>> written instrument signed by an authorised signatory.
>> 
>> ===
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> NOTE: This message may contain information that is confidential, 
>> proprietary, privileged or otherwise protected by law. The message is 
>> intended solely for the named addressee. If received in error, please 
>> destroy and notify the sender. Any use of this email is prohibited 
>> when received in error. Impetus does not represent, warrant and/or 
>> guarantee, that the integrity of this communication has been 
>> maintained nor that the communication is free of errors, virus, interception 
>> or interference.
>> 










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===



RE: Help needed in breaking large index file into smaller ones

2017-01-09 Thread Moenieb Davids
Hi,

Try split on linux or unix

split -l 100 originalfile.csv
this will split a file into 100 lines each

see other options for how to split like size


-Original Message-
From: Narsimha Reddy CHALLA [mailto:chnredd...@gmail.com] 
Sent: 09 January 2017 12:12 PM
To: solr-user@lucene.apache.org
Subject: Help needed in breaking large index file into smaller ones

Hi All,

  My solr server has a few large index files (say ~10G). I am looking for 
some help on breaking them it into smaller ones (each < 4G) to satisfy my 
application requirements. Are there any such tools available?

Appreciate your help.

Thanks
NRC










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


OnError CSV upload

2017-01-09 Thread Moenieb Davids
Hi All,

Background:
I have a mainframe file that I want to upload and the data is pipe delimited.
Some of the records however have a few fields less that others within the same 
file and when I try to import the file, Solr has an issue with the amount of 
columns vs the amount of values, which is correct.

Is there not a way, using the standard CSV upload, to continue on error and 
perhaps get a log of the failed records?












===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===


LineEntityProcessor | Separator --- /update/csv | OnError

2017-01-05 Thread Moenieb Davids
Hi,

Just wanted to know if anybody can assist with the following scenario:
I have a pipe delimited mainframe file\s that sometimes misses certain fields 
in a row, which obviously causes issues when I try the /update/csv handler.

Scenario 1:
The csv handler is quite fast, however, when it picks up a line that does not 
have all the fields due to a missing delimiter, then the entire import fails.
So, is there a way to do a OnError skip type of scenario. 
I have check the 6.3 ref guide and web but no luck

Scenario 2:
I try to use a my own DIH and then configure my schema accordingly, however, I 
am trying to use the separator parameter, but it seems to not be working.
It looks like the data always just goes to rawline which then means that the 
separator effectively means nothing?

I am trying to not go custom too much, so does anybody know of a "standard" way 
of getting the data in

Regards
Moenieb










===
GPAA e-mail Disclaimers and confidential note 

This e-mail is intended for the exclusive use of the addressee only.
If you are not the intended recipient, you should not use the contents 
or disclose them to any other person. Please notify the sender immediately 
and delete the e-mail. This e-mail is not intended nor 
shall it be taken to create any legal relations, contractual or otherwise. 
Legally binding obligations can only arise for the GPAA by means of 
a written instrument signed by an authorised signatory.
===